Sample Page – Scalefree

Typical Mistakes in Agile Approaches and How to Avoid Them

Watch the Webinar

In our webinar ‘Typical Mistakes in Agile Approaches’ we’ll explore the world of Agile Project Management, introducing Scrum as a powerful framework.

We’ll dive into the Data Vault 2.0 methodology for data integration in DWHs. Additionally, we’ll also discuss common mistakes when transitioning from Waterfall to agile approaches, including challenges specific to Data Vault and Scrum, offering practical guidance.

Join us to uncover common pitfalls and mistakes encountered in Agile Project Management and how to avoid them.

Watch Webinar Recording

Webinar Agenda

1. Get started with project management
2. Let’s get to know Scrum and agile project management and where are the pitfalls?
3. How does agile project management fits Data Vault 2.0?
4. How to avoid the Pitfall of not delivering business value

Building a scalable Data Platform? In Agile Data

Seamless Agile Project Management in the BI Landscape

Agile project management has gained significant popularity in the corporate world due to its emphasis on collaboration, customer feedback, and continuous development. This approach has found its way into Business Intelligence (BI) projects, yet many companies still encounter common challenges or struggle to fully embrace agile methodologies.

A critical aspect of agile project management in BI involves effective collaboration among different teams. In today’s fast-paced business environment, projects often necessitate contributions from multiple departments or teams with diverse skill sets.
However, communication gaps, conflicting priorities, and differences in work styles can pose significant obstacles to smooth coordination and project success. Overcoming these challenges requires fostering open communication channels, establishing clear protocols for inter-team interactions, and cultivating a culture of mutual respect and understanding.

In the context of Data Vault and BI projects, several touchpoints demand collaboration from various departments or teams. For instance, gathering background information about the data source, ensuring privacy compliance, and obtaining well-defined user requirements are all critical components. Implementing well-defined processes, where everyone understands their responsibilities, can streamline these tasks. For example, ensuring privacy tagging precedes the implementation of the Raw Data Vault to ensure accurate Satellite splits. Additionally, clear requirements from users or reporting teams, in the form of User Stories or Question Stories, are fundamental for smooth project execution. In the following, you can visualize a high level example of an Development process of an Dashboard in a typical BI Project and the different Teams/People needed.

Another common issue in BI projects is the excessive focus on technical aspects. Teams often dedicate significant time to building the entire technological infrastructure, neglecting the timely delivery of business value. At Scalefree, we advocate prioritizing business value from the project’s outset. We endorse the tracer bullet approach as an effective method, contingent on well-defined requirements. These issues are just a few examples of the challenges often faced in BI projects.

If you’re unfamiliar with the tracer bullet approach, don’t worry. We’ll dive into this topic and more in our webinar, “Typical Mistakes in Agile Approaches”. We also will take a look at Scrum and the Data Vault 2.0 methodology and uncover common mistakes in agile Project Management and learn how to avoid them. Watch the recording here for free!

OUR NEW MONTHLY EXPERT SESSION

To address the growing interest and challenges in agile project management, Scalefree is introducing “Insight Agile Projects,” a monthly expert session designed to address questions, share insights, and enhance collective knowledge in agile project management. Join us every 2nd Thursday of the month to gain valuable insights from our experienced project management experts, covering a range of topics including requirements analysis, effort estimation processes, stakeholder and people management, and their integration with the Data Vault 2.0 methodology.

Don’t miss the opportunity to enhance your project management proficiency with our expert guidance. Mark your calendars for the upcoming webinar and join our monthly expert sessions to gain invaluable insights into agile project management.

– Lorenz Kindling (Scalefree)

Building a scalable Data Platform? In Data Vault Friday

Reference Hubs and Effectivity Satellites in Data Vault

Watch the Video

In the ongoing Data Vault Friday series, our CEO, Michael Olschimke, delves into two insightful questions raised by our audience about Reference Hubs and Effectivity Satellites.

“Should the Satellite hang from the Master Hub or Link? Our preference was to hang from the Hub as it kept the model simple and also to keep the CDC straightforward. Another reason we are leaning toward this is, that the descriptive columns belong to the Master Hub so it would be ideal to keep it as a satellite under the hub.”

“Now with the link, if there is any change in the relationship between the Master Hub Column to the Reference Hub Column in the link we would like to capture it. And it is via effectivity satellites. If we have One Master Hub column and a lot of the Reference Hubs columns do we end up in a lot of Effectivity Satellites? Or just one effectivity satellite as the driving key is the Master Hub column?”

Michael provides insights into these considerations regarding Reference Hubs and Effectivity Satellites, offering guidance on the optimal structuring of Satellites and managing Effectivity in Link relationships.

Meet the Speaker

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

Jan Paul Langner In Data Warehouse

Monitoring of a Data Vault-Powered EDW

Introduction

In this newsletter, we’ll discuss an overview of the possible ways to monitor your solution within an Enterprise Data Warehouse (EDW) built using Data Vault 2.0.
The monitoring approaches below focus on giving an overview of different possibilities to grant insights into the performance of your data warehouse solution. The goal is to gain useful information about the technical data of your EDW and use them to track errors, find bottlenecks inside loading and transformation processes, and overall boost the performance of your data warehouse.

In a webinar session dived deeper into this topic of technical testing and monitoring methods for a Data Vault powered EDW. It covered all EDW layers and provided performance insights with a focus on modeling metrics vaults and marts, emphasizing source data.

Watch the recording here for free!

What to expect

You will receive an overview of different metrics and types of metadata to monitor your solution. One step further, the modeling approach for capturing and later analyzing the data is covered for different layers of your EDW, starting with the classical metrics vault and ending with the metrics marts. Additionally, different areas of monitoring are covered, focusing on potential challenges inside your solution. The main focus of this newsletter is how to provide metadata and therefore gain useful insights.

Modeling Metadata

Four instances are covered, which hold useful technical data. These are the metrics vault and the metrics mart built on top of it, the meta mart, and lastly the error mart. Each of these have their own use, although the modeling and data source may resemble one another. The similarities and differences between the modeling approaches of these mostly technical tables and their data source are explained in the below sections.

Metrics Vault

The metrics vault is used to catch defined metrics, which originate from ETL/ELT data flows. It’s not a mandatory component of Data Vault, however when implemented, your solution will gain several advantages from technical capabilities, such as:

Error inspection
Root cause analysis
Performance metrics

In terms of data flows, it focuses on identifying errors, as well as granting insights into the origins of these failures, in order to take actions to prevent these errors from happening again in the future.

The metrics themselves might originate from different sources. These sources may be the data platform, the ETL/ELT, or ESW job orchestration tool you are using. For this reason, we recommend modeling these data sources using the same patterns and standards you use to produce the Raw Data Vault, this will provide a scalable solution to capture and process your metrics.

Metrics Mart

Inside the metrics vault, the data is modeled and optimized for storage, scalability, and flexibility. To analyze the gathered data, the model is needed to fit the end-user’s requirements. For this matter, the metrics mart is part of the information delivery layer. As the metrics vault is built exclusively to provide data for the metrics mart, the standards of the Data Vault 2.0 architecture have been followed with this approach. The metrics mart is a special-purpose Information Mart which is sourced mainly from the metrics vault. It can be assumed that both materialization of marts as tables and virtualization as views can be considered and performed. The virtualization approach may be preferred, as deployment speed benefits from this decision.

As mentioned above, the main data source for the metrics mart is the metrics vault, although it’s not limited to it. In some cases, the business may decide to couple business objects with metrics to track the performance of specific business objects. This also helps to further boost performance.

The data model of the metrics mart is not fixed and varies due to the needs of the end-user. Therefore, a dimensional model could be the best solution, but it doesn’t have to be. In some cases, the data originates from the tools themselves, in such cases, the original structure of these tools could fulfill the purpose of analyzing the data more efficiently.

Meta Mart

Metadata is stored directly in the meta mart. These marts don’t source from the raw data vault, thus there is no such thing as a meta vault as a source for the marts to virtualize them. Therefore, the meta mart is materialized as tables. The model is similar to other information marts. For this matter, the business requirements imply the exact model of the meta mart, as its main function is to provide data in such a way that businesses can make the most use of it. In some cases, it might be a Dimensional Model, in other cases, it could be modeled in third-normal form.

Error Mart

The error mart is another information mart. Therefore, it also relies on the business as to how to exactly implement the model, although in practice most of them are modeled using a dimensional model. The goal of the error mart is to catch errors, so-called “ugly data”. This data is rejected by your transformation tools, for example rejected data rows from your soft rules. As soft rules change over time, so do your error marts. These errors don’t source from the raw data vault, resulting in the absence of an Error Vault. The data for the error marts can come from a variety of sources, although most of it originates from the ETL / ELT engine.

Managing Metadata

As the section above has described how to model specified information marts and what the main purpose of those are, this section focuses on the details of “what to load”. For this matter, a closer look at general data itself, its origin, and use case is taken at this point.

Business Metadata

Business Metadata is defined as metadata with meaning for the business. Although, it is not completely defined who is responsible for this data, as it depends on the business itself, the data warehouse team is responsible for the management of the data. The meta mart covers this type of metadata.

Some metadata you should consider tracking include the following:

Business column names
As the business may decide to use abbreviations in prefixes or suffixes, which need to be tracked in order to be more coherent for the business.
Business definitions
As these definitions are the foundation for analyzing data in later processes. It should be a business description of the different attributes and tables. It is also an indicator for the necessity of the attribute or table. If the business can’t provide information about the column/attribute or table, it should be considered as to whether it is necessary within an information mart or if it needs to be redefined within the business.
Ontologies and taxonomies
As it provides information about the business object behind the source data, such as the classification of the business object, the relationship to other objects or hierarchies.
Physical table and column names
As they provide information about the relation between the business object names and the names of the physical tables and attributes. This information is used to associate the data and the business objects.
Record source
As it describes the different record sources in business terms. Therefore, its goal is not to provide a reference to technical instances such as databases. Furthermore, it aims to describe them in a manner that the business meaning behind those sources is provided.
Table specification
As they describe the purpose of the source table and data as well as the column names and keys.
Exception-handling rules
As they provide a list of potential technical issues, such as potential errors or data quality issues and how the ETL / ELT process handles these errors.
Source system business definitions
As they describe the business meaning of source attributes.
Business rules
As they generate new data based on business terms. Developers need some understanding of these rules and transformations in order to comprehend the requirements. As business metadata, the business rules are described in a way that the business can understand these. It should be considered to divide between hard rules and soft rules.

Technical Metadata

As business metadata serves the business the most, the data warehouse team is most likely to benefit from technical metadata, which is also stored in the meta mart. For this reason, most of technical metadata originates from technical components, resulting in the following examples:

Source systems
As it provides a technical description of the source systems, such as database names or flat file locations.
Data models
These are not typical metadata, although they are a helpful asset. They provide information about relationships between the used tables and could be presented in a graphical form.
Data definitions
As they describe the data in a technical way. This includes information about the table name, column name, data type, constraints, or default values.
Business rules
Business rules are also a part of technical metadata. Here, specifically, the technical definitions of these rules are considered, as they need to be implemented in the ETL / ELT tools later on. Hard rules and soft rules should be separated, just as in the business metadata.
Volumetrics
As they describe the evolution of data loads, table sizes, and growth patterns in order to estimate the workload of the data warehouse.
Ontologies and taxonomies
Technical metadata also should provide information about ontologies and taxonomies, like abbreviations of terms and attributes, relationships, business key designations, parents, and hierarchies as well as re-defines.
Data quality
This metadata provides information about data quality metrics, which may include standardization of source data.

Process Execution Metadata

Process execution metadata is not provided by the business or source applications like business metadata and technical metadata, but furthermore is generated by the data warehouse team to provide insights into the ETL / ELT processes. The main goal is to provide information for a better understanding of the data warehouse performance and thus potentially increasing this performance. Most of the data comes from the ETL / ELT tool you are working with, delivering the following metadata:

Control flow metadata
As a control flow executes at least one data flow, it provides data about processes and data lineage of all elements of the data warehouse.
Data flow metadata
As they provide data about the data flows themselves. This includes the data volumes and rejected rows of a data flow / data transformation.
Package metadata
The package metadata contains a summary of the information about running a package, which usually executes a control flow.
Process metadata
Most packages are executed by a scheduling application. The process metadata provides data about the process that has started the package.

The process execution metadata should be stored in the metrics vault and kept separated from the business metadata and technical metadata.

Conclusion

In this newsletter, we provided an overview of monitoring options for your Data Vault-powered EDW.
We covered different types of information marts, including their modeling approach, their data sources, and usage for businesses and their data warehouse team.
We learned about different types of metadata, their source and which purpose they serve. For this matter, we divide between business metadata, that describes your data in terms, that business can make the most use out of it, technical metadata, that holds technical information about the processed data, and process execution metadata, which provides information about your data processes and ETL / ELT pipelines.
Finally, this newsletter provides information about how and what metadata you can process to monitor your EDW according to the standards of Data Vault 2.0.

We will go even deeper in our webinar. Make sure to watch the recording for free! We are looking forward to seeing you there.

Building a scalable Data Platform? In Data Vault Friday

Natural Key vs. Technical ID in Data Vault

Watch the Video

In our continuous exploration of Data Vault concepts during the Data Vault Friday series, our CEO, Michael Olschimke, takes on a thought-provoking question from a member of our audience regarding natural keys.

“Can a work order ID or recommendation ID be considered a natural business key even though there is no business ‘meaning’ to these ids? or should I use the text description of the recommendation as the business key? The work order would have to be a composite key made up of details and a date for the work order which at that point seems like just using the ID makes more sense.”

Michael addresses the nuanced concept of Natural Key and Technical ID in the context of work order and recommendation IDs. The discussion provides valuable insights into making informed decisions about choosing business keys in scenarios where the identifiers lack inherent business meaning.

Meet the Speaker

Michael Olschimke

Building a scalable Data Platform? In Beginner, Salesforce

Snackable Data Insights: Analytics Meets Salesforce Data Cloud

Watch the Webinar

What are we going to do with all that data? Something you might ask yourself when you are checking the current size of your data lake or data warehouse.

In our webinar, we will explore together how we can make use of our data residing in Snowflake using Salesforce Data Cloud or Tableau. Extract value from your data now!

Join us to learn more about the newest analytics tools inside the Salesforce ecosystem and how you can use them in your data architecture.

Watch Webinar Recording

Webinar Agenda

1. Recap DDL ’23
2. Ask the Right Questions
3. Issues
4. (New) Tools

Meet the Speaker

Markus Lewandowski

Markus Lewandowski is a Consultant at Scalefree, who has over 6 Years experience in Salesforce development and administration. While also being a Certified Data Vault 2.0 Practitioner (CDVP2™) he is taking a hybrid role as a Salesforce Specialist and DV 2.0 Practitioner. His main competences are Salesforce Process Automation, Application Integration and Data Management.

Building a scalable Data Platform? In Data Vault Friday

HL7 FHIR resources in Data Vault

Watch the Video

In our continuous exploration of Data Vault concepts during the Data Vault Friday series, our CEO, Michael Olschimke, takes on a compelling question posed by a member of our audience.

“How would you model data that is transmitted as HL7 FHIR resources in Data Vault?”

Michael delves into the intricacies of handling HL7 FHIR resources within the Data Vault framework. This session is a valuable resource for those seeking insights into the effective modeling of healthcare data encoded in HL7 FHIR resources.

Meet the Speaker

Michael Olschimke

Building a scalable Data Platform? In Data Vault Friday

Business Satellites in Data Vault

Watch the Video

In the latest segment of our Data Vault Friday series, our esteemed CEO, Michael Olschimke, delves into a question brought forward by a member of our audience.

“What are business satellites?”

Michael passionately explores the concept of business satellites in a dedicated session, offering in-depth insights and valuable perspectives on their significance within the Data Vault methodology. This engaging discussion is geared towards enhancing your understanding of business satellites and their role in the broader context of Data Vault architecture.

Meet the Speaker

Michael Olschimke

Building a scalable Data Platform? In Data Vault Friday

Data Vault Referential Integrity

Watch the Video

In the latest installment of our Data Vault Friday series, our accomplished Managing Consultant, Marc Finger, addresses a pertinent question posed by a member of our audience.

“How is referential integrity handled in DV 2.0?”

Marc provides comprehensive insights into the intricacies of maintaining referential integrity within the Data Vault 2.0 framework. He explores the nuances and best practices associated with ensuring a robust and reliable structure that upholds the integrity of relationships within the Data Vault architecture.

Meet the Speaker

Marc Finger

Marc is working in Business Intelligence and Enterprise Data Warehousing (EDW) with a focus on Data Vault 2.0 implementation and coaching. Since 2016 he is active in consulting and implementation of Data Vault 2.0 solutions with industry leaders in manufacturing, energy supply and facility management sector. In 2020 he became a Data Vault 2.0 Instructor for Scalefree.

Dmytro Polishchuk In Data Warehouse, Intermediate

Technical Tests of a Data Vault Powered EDW

Data Vault Powered EDW

In this newsletter, we’ll discuss and make an overview of different methods and approaches when performing technical tests of a Data Vault-powered EDW.

The testing approaches described below are aimed to ensure the integrity, reliability, accuracy, consistency, and auditability of the data loaded into your Data Vault entities as well as the information marts built on top of them. All to ensure that it is safe for your business to make decisions based on that data.

Technical Tests and Monitoring of a Data Vault powered EDW

In this webinar, our experts will a give you an overview of different methods and approaches of technical testing and monitoring of a Data Vault powered EDW. The testing approached to be discussed are suitable for different layers of your EDW solution, starting from extracting data from sources to the landing zone/staging area (Extract and Load) and ending with information marts used by end users in their BI reports. The main focus of our webinar though is testing the Data Vault 2.0 entities in the Raw Vault and Business Vault layers. The monitoring focuses on providing insights into the performance of your EDW. Starting with the modeling approach of the metrics vault and metrics marts, the areas of the source data of these entities will be covered. This data captured provides information about the process execution of your ELT processes, as well as error information. By inspecting the error marts, you can track your errors, find the root cause or boost your performance by taking performance metrics into account.

Watch Webinar Part 1 Watch Webinar Part 2

In this article:

What to expect
Testing Data Extraction Process
Testing Data Vault
Test Automation and Continuous Integration
Conclusion

What to expect

You will receive an overview of testing approaches suitable for different layers of your EDW solution starting from extracting data from sources to the landing zone/staging area (Extract and Load) and ending with information marts used by end users in their BI reports. Additionally, we will discuss test automation and its importance for continuous integration of your Data Vault-based EDW. That stated, the main focus of this newsletter though is testing the Data Vault entities in the Raw Vault and Business Vault layers.

Testing Data Extraction Process

Regardless of where the data extraction takes place – data source, persistent staging, transient staging – the main goal of testing at this phase is to prove that there is no leakage while transporting or staging the data. Comparing the input data and the target data ensures that the data has not been accidentally or maliciously erased, added, or modified due to any issues in the extraction process.
Checksums, hash totals, and record counts shall be used to ensure the data has not been modified:

Ensure that checksums over the source dataset and the target staging table are equal
Ensure that the numerical sum of one or more fields in a source dataset (aka hash total) matches the sum of the respective columns in the target table. Such sum may include data not normally used in calculations (e.g., numeric ID values, account numbers, etc.)
Make sure the row count between the source and the target staging table matches

Testing Data Vault

The core part of your Data Vault-powered EDW solution is a Raw Data Vault which contains raw and unfiltered data from your source systems that has been broken apart and loaded into hubs, links, satellites, and other Data Vault-specific entities based on business keys. This is the first point in the data pipeline in which the data lands in the Data Vault-modeled entities . Thus, specific tests are required to ensure consistency and auditability of the data after the Raw Data Vault is populated. The below test approaches are valid for Business Vault entities as well.

Testing Hubs

Hubs store business keys by separating them from the rest of the model. A hub is created for every business object. It contains a unique list of keys representing a business object that have the same semantic meaning and granularity. The business objects residing in a hub are then referenced from other Data Vault entities through hash keys calculated during the staging phase.

As such, the following tests are necessary to perform on hubs to ensure their consistency:
For a hub with a single business key, tests should ensure that:

A hub contains a unique list of business keys (primary key (PK) test)
A business key column contains no NULL or empty values (except when business key is composite)

If a hub has a composite business key, ensure that:

The combination of the values in the business key columns are unique (PK test)
Business key columns don’t contain NULL or empty values all at once

The validity of the latter point also depends on the nature of the business objects itself. It can also be that NULLs or empty values are not allowed in any of the business key columns.

For the both kinds of hubs, ensure that:

Hash key column contains:
- Unique list of values (PK test)
- No NULLs or empty values

Testing Links

A typical link defines relationships between business objects by storing unique combinations of hash keys of the connected hubs. The primary key of the link or link hash key uniquely identifies such a combination. Thus, link tests should check that:

The combination of connected hub references (hub hash keys) is unique (PK test)
Every hub hash key value exists in the referenced hub
Hub references do not contain NULLs or empty values

Regarding the last bullet point, note that the NULLs and empty values in hub references, as well as in hash key columns of other Data Vault entities, are replaced with zero keys.
For transactional (non-historized) data, transactional key columns should be included into the uniqueness tests in addition to columns with hub hash keys. Make sure that transactional keys are populated as well. Such transactional keys are usually not hashed since, as a rule, no hubs for transactions are created.
And, as for hubs, make sure that the link hash key column contains unique values and there are no NULLs and empty values.

Testing Satellites

Satellites store descriptive information (attributes) for business objects (residing in hubs) or relationships between business objects (residing in links). One satellite references either one hub or one link. Since descriptive information for business objects and relationships between them may change over time, a load date timestamp of a satellite record is added to the primary key of a satellite.

With the above said, tests for a satellite should make sure that:

The combination of a hub/link reference (the Hash Key) and the load date timestamp of a record is unique (PK test)
Every hub or link hash key value exists in the referenced hub or link
Hub or link references do not contain NULL or empty values

Multi-active satellites contain multiple active records at the same time. Thus, additional key columns (for example, Type Code, Sequence, etc.) are needed to uniquely identify a record. These additional key columns have to be part of the unique test of a multi-active satellite. Additionally, they should be tested for the absence of NULL and empty values.
The approach for testing a non-historized satellite also differs a bit from testing its standard sibling. A non-historized satellite is a special entity type that contains descriptive attributes for every corresponding record in a non-historized link. A primary key of a non-historized satellite is a link hash key. Thus, there is no need to include a load date timestamp into the primary key check. For a non-historized satellite, additionally make sure that it has a 1:1 relationship with the corresponding non-historized link. Record counts in both entities should match exactly.

Testing Other Data Vault Entities

There are other special entity types in Data Vault worth mentioning in regards to testing:

Reference hubs and reference satellites: Testing approaches are similar to standard hubs and satellites. The only difference is there are no hash keys and business keys are used directly.
Record source tracking satellites: A column representing a static source name is added to the primary key test.
PIT Table (Business Vault):
- PK test – combination of the hub/link hash key and the snapshot date timestamp columns is unique
- For every satellite reference, check that the pair of hub/link hash keys and the load date timestamp exists in the referenced satellite
- Hub/link reference does not contain NULL or empty values
Bridge Table (Business Vault):
- PK test – combination of a base link hash key and snapshot date timestamp columns is unique
- For every hub and link reference, check that a pair of hub/link hash key exists in the referenced hub or link

General Tests for all Data Vault Entities

There are some tests applicable for all Data Vault entities.
Make sure that all Data Vault entities:

Contain zero keys instead of NULL keys
Have records source columns that are populated and correspond to the defined pattern (e.g., regex). For example, check if it contains the file path where the name of the top level folder represents the name of the source system and the file name includes the timestamp of the data extraction
Don’t have NULL values in their load (snapshot) date timestamp columns

Testing Source Marts

The Source Mart is one of the facets of the Information Mart concept in the Data Vault. It is a virtualized model on top of the Raw Data Vault with the aim of replicating the original source structures. It is great for ad-hoc reporting and offers higher value for many data scientists and power users and can also be used to test consistency and auditability of the loading process into a Data Vault-powered EDW.

Source mart objects are intended to look the same as the respective source tables (including columns names). If you have source marts implemented in your EDW, make sure to compare them against the respective source tables in the staging area after the data loading process. Values and row counts of source structures should match exactly against the respective source mart objects. In the Data Vault community, this kind of test is also known as a “Jedi-Test”.

It is relatively easy to automate such comparison and make it a part of the loading process.

Testing Hash Key and Hash Diff Calculations

Hash keys in Data Vault allows business keys to be integrated in a deterministic way from multiple sources in parallel. They are the glue that binds different Data Vault entities together.

Hash diffs, on the other hand, apply to the satellites and help identify differences in descriptive attributes during the data loading process.

It is important to introduce unit tests for hash key and hash diff calculations used in your EDW, to make sure the hashed values are calculated in accordance with the hashing standards defined. Read more about requirements and templates for hashing here. Test cases for such unit tests should cover as many combinations of different data types and values (e.g. NULL and empty values) as possible, to ensure that they are calculated consistently.

In case your EDW exists on different DBMS platforms (e.g. during migration process or data security regulations), the above test cases can be used to make sure that your hash calculations are platform agnostic, meaning that they produce the same result on different platforms. There is a common use case, when a link in an on-premise DBMS platform references a hub that was already migrated to a Cloud platform. Such unit tests can be run on both platforms to ensure consistency of hashing during a migration.

Testing Business Rules

Unlike the hard rules that do not alter or interrupt the contents of the data, maintaining auditability, soft or business rules enforce the business requirements that are stated by the business users. Examples of business rules can include:

Concatenation (last name and first name)
Standardizing phone numbers
Computing total sales (aggregation)
Coalescing, etc.

Apart from the relatively simple examples listed above, there might also be some more complex business rules involving sophisticated calculations, data transformations, and complex joins. Depending on the use case, the results of applying such rules usually end up in the Business Vault (i.e. a Business Satellite) and later in the Information Mart layer where they are consumed by the business users. Thus, testing business rules is an important part of the information delivery process.

Business rules are usually also a subject of unit testing that must be done continuously during the development and CI process. To perform such a unit test, we need some expected values, in the best case provided by the business, i.e., an expected net sales value for a defined product or a set of products in a specific shop on a named day based on the real data. The net sales calculation from our Business Vault is then tested against the given expected result.

Test Automation and Continuous Integration

All of the above described tests should be automated as much as possible and run by EDW developers during the development process. Successful tests should be an obligatory condition for introducing any new change to your EDW code base. That is achievable by using DevOps tools and enabling continuous integration (CI) in your DWH development lifecycle. Running automated tests each time code is checked or merged ensures that any data consistency issues or bugs are detected early and fixed before they are put into production. As a rule, a separate test (or CI) environment is created for the purpose of running automated tests.

Here are some general recommendations for creating and running a test environment:

Create the CI environment as similar as possible to the production environment
Create test source databases and source files derived from real data
The test source files and source databases should be small so tests can run quickly
The test source files and source databases should also be static so that the expected results are known in advance
Test full load and incremental load patterns since the logic of both patterns is different in most of the cases
Run tests not only against the changes to be merged but also against all the downstream dependencies, or even the whole loading process in general to prevent regressions.

Conclusion

In this newsletter, we provided an overview of different methods and approaches for the process of technical testing a Data Vault powered EDW.

We covered testing of different stages of the EDW load including extraction of the data from data sources, loading Data Vault entities, and information delivery process, though primary focus was placed upon loading Data Vault entities.

We also covered unit testing hash key & hash diff calculations.

It is important to make sure that your hashing solution is platform/tool agnostic, especially during the migration process.

We also learned that testing business rules is a key part of the information delivery process since they interpret the data and define what business users see in their reports. We highlighted the importance of unit testing the business rules and cooperation with the business in respect to defining test cases and expected results.

Furthermore, we also stressed the significance of test automation during the development phase as well as for enabling continuous integration and provided recommendations for creating and running a test environment.

We go even deeper into this topic in our webinar. Make sure to watch the recording for free!

Markus Lewandowski In Beginner, Salesforce

Wie Integriere Ich Meine Office-Welt in Salesforce

Watch the Webinar

Entdecken Sie in unserem Webinar, wie Sie Ihre Office-Welt und alle dazugehörigen Abteilungen eines Unternehmens in die Salesforce-Umgebung integrieren können.

Lernen Sie, welche Abteilung sich mit welcher Salesforce-Cloud abbilden lässt und wie dadurch die Zusammenarbeit, Kommunikation und Geschäftsprozesse optimiert werden können.

Verpassen Sie nicht die Gelegenheit, Ihre Effizienz zu steigern und Ihre Vertriebsaktivitäten zu optimieren.

Registrieren Sie sich jetzt für wertvolle Einblicke und praxiserprobte Ratschläge.

Watch Webinar Recording

Webinar Agenda

1. Salesforce Clouds
2. DocuSign
3. Kollaboration
4. E-mail
5. Mowi

Watch the Webinar

Webinar Agenda

OUR NEW MONTHLY EXPERT SESSION

Watch the Video

Meet the Speaker

Watch the Video

Meet the Speaker

Introduction

What to expect

Modeling Metadata

Metrics Vault

Metrics Mart

Meta Mart

Error Mart

Managing Metadata

Business Metadata

Technical Metadata

Process Execution Metadata

Conclusion

Watch the Video

Meet the Speaker

Watch the Webinar

Webinar Agenda

Meet the Speaker

Watch the Video

Meet the Speaker

Watch the Video

Meet the Speaker

Watch the Video

Meet the Speaker

Data Vault Powered EDW

Technical Tests and Monitoring of a Data Vault powered EDW

What to expect

Testing Data Extraction Process

Testing Data Vault

Testing Hubs

Testing Links

Testing Satellites

Testing Other Data Vault Entities

General Tests for all Data Vault Entities

Testing Source Marts

Testing Hash Key and Hash Diff Calculations

Testing Business Rules

Test Automation and Continuous Integration

Conclusion

Watch the Webinar

Webinar Agenda

Build Better Data Platforms

SOLUTIONS

TRAINING

EVENTS

KNOWLEDGE HUB

CAREERS

COMPANY