Sample Page – Scalefree

Building a scalable Data Platform? In Data Vault Friday

Data Vault Hashing or Not?

Watch the Video

Exploring Data Vault 2.0: Managing Hashing Costs in Smaller Environments

In the evolving landscape of data management, Data Vault 2.0 stands out as a robust methodology designed for scalability, flexibility, and consistency across diverse technological environments. A crucial component of Data Vault 2.0 is the use of hashing for business keys (BKs) and hash diffs. Hashing ensures data integrity and efficiency, especially in distributed systems. However, the performance costs associated with hashing can sometimes become a significant concern. This blog post delves into the nuances of hashing in Data Vault 2.0, the trade-offs involved, and when it might be feasible to deviate from the standard approach.

In this article:

The Role of Hashing in Data Vault 2.0
Challenges of Hashing
Evaluating Hashing Alternatives
The Case Against Sequences
Hash Keys vs. Business Keys
- Hash Keys
- Business Keys
Performance Optimization Strategies for Hashing
Future Trends and Recommendations
Conclusion
Meet the Speaker

The Role of Hashing in Data Vault 2.0

Data Vault 2.0 leverages hashing to create unique, consistent identifiers for business keys and to detect changes in data efficiently. This method is technologically agnostic, meaning it can be implemented across various databases and data platforms, whether on-premises or in the cloud. The primary advantages of hashing include:

Consistency Across Systems: Hashing ensures that business keys are consistent and unique across different systems and regions.
Improved Query Performance: Pre-calculating hash diffs can make query execution faster and more efficient, transferring the computational load from query time to data loading time.
Simplified Data Integration: Hash keys provide a straightforward way to manage and integrate data from multiple sources, reducing the complexity of data joins.

Challenges of Hashing

Despite its benefits, hashing can introduce performance challenges, particularly in the following scenarios:

Wide Tables: Calculating hash diffs for tables with a large number of columns can be computationally intensive.
Complex Hash Functions: Ensuring that hash functions generate unique strings can be complex and resource-heavy.
Hardware Limitations: On-premises environments with limited hardware capabilities might struggle with the additional computational load required for hashing.

Evaluating Hashing Alternatives

When faced with performance concerns, particularly in smaller, local solutions, it’s essential to consider whether deviating from the standard hashing approach would be beneficial. There are three primary options to consider:

Hash Keys: The default and recommended option for most environments, especially those involving distributed systems or diverse technologies.
Sequences: A legacy approach from Data Vault 1.0 that uses sequential numbers as identifiers.
Business Keys: Using the original business keys directly as identifiers.

The Case Against Sequences

Sequences, although a viable option, are generally not recommended in modern Data Vault implementations due to several drawbacks:

Lookup Overhead: Sequences require lookups during data loading, which can slow down the process significantly.
Orchestration Complexity: Managing sequences adds complexity to the loading process, particularly in real-time scenarios.
Distributed System Challenges: Sequences do not perform well in distributed environments where parts of the solution might reside in different locations (e.g., cloud and on-premises).

Hash Keys vs. Business Keys

When deciding between hash keys and business keys, the choice largely depends on the specific technology stack and the environment. Here are some considerations:

Hash Keys

Pros: Provide a consistent, fixed-length identifier that simplifies joins and queries across various systems. They are particularly beneficial in mixed environments.
Cons: Slightly higher computational cost during data loading compared to sequences. However, the consistent performance across queries often outweighs this drawback.

Business Keys

Pros: Directly using business keys can simplify the architecture in environments where the data platform supports efficient handling of these keys.
Cons: Can lead to complex and less efficient joins, especially in mixed or distributed environments.

Performance Optimization Strategies for Hashing

For environments where hashing performance is a concern, several optimization strategies can be employed:

Leverage Hardware Acceleration: On-premises environments can benefit from hardware acceleration, such as PCIe express cards with crypto chips, to offload hash computation from the CPU.
Utilize Optimized Libraries: Many platforms use highly optimized libraries (e.g., OpenSSL) for hash computations, which can significantly improve performance.
Incremental Loads: Ensure that performance evaluations consider multiple load cycles to capture the benefits of hash diffs during delta checks, not just initial loads.

Future Trends and Recommendations

Looking forward, the evolution of data platforms and technologies might shift the balance towards using business keys more frequently. As Massively Parallel Processing (MPP) databases become more prevalent, their native support for efficient key management could make business keys a more attractive option. However, until such technologies are ubiquitous, the default recommendation remains to use hash keys for their broad compatibility and consistent performance.

Conclusion

Data Vault 2.0’s approach to hashing business keys and hash diffs provides significant advantages in terms of consistency, scalability, and performance. While the performance costs of hashing can be a concern, particularly in smaller environments with limited hardware, careful consideration of the available options and optimization strategies can mitigate these issues. Ultimately, the decision should be guided by the specific technological context and future-proofing considerations.

For most scenarios, hash keys remain the recommended approach due to their versatility and robustness in mixed and distributed environments. However, as technology evolves, the use of business keys might become more feasible, highlighting the importance of staying informed about the latest trends and advancements in data management.

Meet the Speaker

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

Building a scalable Data Platform? In Data Vault Friday

Multi Active Satellites on Links

Watch the Video

In our ongoing series, our CEO Michael Olschimke addresses a complex question from the audience regarding the use of Multi Active Satellites (MAS) on Links within a Data Vault 2.0 model. This topic touches on advanced aspects of data modeling, particularly in the context of handling multiple active records.

The question posed was, “Can the Multi Active Satellites be used on LINKs too (considering that on Link we have the option of using the child dependent key)? Please ignore the fact that the link doesn’t have a Hash column on all HUB keys.” Michael’s response delves into the practical application of MAS on Links, an area that can greatly enhance the flexibility and scalability of data models. He explains that while traditionally Multi Active Satellites are used with Hubs to track multiple active records, their application on Links is feasible and beneficial. By leveraging the child dependent key, it is possible to maintain multiple active relationships between entities, which is particularly useful in scenarios where relationships are dynamic and subject to frequent changes.

Drawing on his 15 years of experience in Information Technology, with a focus on Business Intelligence over the past eight years, Michael offers a nuanced perspective on this topic. He highlights that while the absence of a hash column on all HUB keys might pose a challenge, it can be mitigated through careful design and implementation strategies. By ensuring that each Link is adequately documented and structured, organizations can effectively use MAS to capture the complexity of real-world relationships without sacrificing data integrity or performance.

In conclusion, Michael emphasizes the importance of flexibility and adaptability in data modeling. Implementing Multi-Active Satellites on Links can provide significant advantages in managing complex data relationships, allowing for more granular and accurate data analysis. This approach aligns with best practices in Data Vault 2.0 and supports the goal of creating robust, scalable, and responsive data architectures. Michael encourages practitioners to challenge conventional boundaries and explore innovative solutions to meet their unique data management needs.

Meet the Speaker

Michael Olschimke

Markus Lewandowski In Beginner, Salesforce

Unlock Success: Dive into the Salesforce Summer Release ‘24!

Watch the Webinar

Gear up for success as we dive into the highly anticipated Salesforce Summer Release ’24 in our exclusive webinar, “Unlock Success: Dive into the Salesforce Summer Release ’24!” Gain a competitive edge by getting ahead of the curve with a sneak peek into the upcoming updates that are set to revolutionize your business. Join us for an insightful exploration of the latest features and enhancements before they’re even released, and ensure you stay steps ahead of the competition.

In this dynamic webinar, we’ll provide you with an insider’s look into what the Salesforce Summer Release ’24 has in store. From game-changing functionalities to transformative enhancements, you’ll discover how these updates can propel your business forward and drive greater efficiency, productivity, and success. Whether you’re a seasoned Salesforce user or new to the platform, this webinar offers invaluable insights to help you maximize the potential of Salesforce and stay at the forefront of innovation.

Don’t miss this exclusive opportunity to gain early access to the groundbreaking features of the Salesforce Summer Release ’24. Register now to secure your spot and embark on a journey towards unlocking success with Salesforce. Stay ahead of the competition and position your business for growth and prosperity in the ever-evolving digital landscape.

Watch Webinar Recording

Lorenz Kindling In Data Vault Friday

How to Track Soft Deletes in an Insert Only Data Vault 2.0 Architecture

Watch the Video

In our ongoing series, our BI Consultant Lorenz Kindling addresses a question from the audience about managing soft deletes in an insert-only data environment. This topic is particularly relevant for those in the field of data warehousing, where maintaining historical data integrity and accuracy is paramount.

The question posed was, “How to track soft deletes with insert only?” Lorenz’s response explores the complexities and best practices for implementing soft deletes within an insert-only framework. He explains that soft deletes involve marking records as inactive rather than physically removing them from the database. This approach is crucial for maintaining a comprehensive historical record and ensuring that data integrity is not compromised. Lorenz suggests using a specific status indicator or a flag within the data model to denote records that are logically deleted. This allows for efficient querying and reporting without the risk of losing historical data.

Lorenz, who has been advising renowned companies since 2021 at Scalefree International, draws on his extensive experience in Business Intelligence and Enterprise Data Warehousing to provide practical insights. He emphasizes that by carefully planning and implementing a robust soft delete mechanism, organizations can achieve a balance between data retention and performance. Lorenz’s approach ensures that data warehouses remain both scalable and efficient, even as they grow and evolve over time.

In conclusion, Lorenz highlights the importance of adopting best practices in data warehouse automation and Data Vault modeling to manage soft deletes effectively. By using insert-only methods with proper indicators for soft deletes, organizations can maintain the integrity and usability of their data warehouses, thereby supporting long-term business intelligence and analytics goals. This strategy not only addresses common data warehousing challenges but also aligns with modern data management principles.

Building a scalable Data Platform? In Data Tools, dbt webinar, Intermediate

Scale Up your Data Vault Project – with dbt Mesh

dbt Mesh

Learn how dbt Mesh enhances Data Vault projects within dbt Cloud by facilitating a more efficient data mesh architecture. The larger a data warehouse project grows, the more people begin to rely and work with the data provided. This work could be consuming the data, applying business rules, modeling facts and dimensions, or other typical tasks in a data environment. In a large organization, all these users might be scattered across different divisions, and the data they are working with might belong to different business domains. At some point, the entire organization faces the challenge of data sharing and governance guidelines, which might prohibit users of the sales department from accessing data from the finance department. A data mesh offers a solution that helps organizations to deal with these challenges. If you want to learn more about the data mesh, check our recent blog article about Data Vault and data mesh here!

We also have a webinar on exactly this specific subject. Don’t miss it and watch the recording for free!

Data Mesh Support bei dbt Cloud

Many organizations struggle with introducing a Data Mesh approach into the Data Vault landscape. In this webinar, we will dive into dbt Mesh, and how to leverage it in a Data Vault project.

Watch Webinar Recording

In this article:

What is dbt Mesh?
Why would I want to refer to other dbt projects?
How can I leverage dbt Mesh in a Data Vault powered Data Mesh?
Conclusion

What is dbt Mesh?

Dbt Mesh is a recently added feature that makes dbt Cloud work more efficiently with a data mesh approach. The already familiar {{ ref() }} function is no longer limited to models within one dbt project, instead it can refer to models of other dbt projects.

Why would I want to refer to other dbt projects?

Imagine a big organization that uses dbt Cloud for their Data Vault implementation. The project might have 400 sources defined, 2000 models implemented, and is used actively by 30 developers. Out of these 30 developers, there might be 5 people specifically working on the Business Data Vault and Information Mart layer for finance-related objects. Another 5 developers are working on the same layers but for sales-related objects.

At some point, you might want to avoid finance people messing around with the sales-related dbt models, so a data mesh architecture is to be implemented. This would allow the organization to define policies regarding data sharing, data ownership, and other governance measures.

With dbt Mesh, both the Sales and the Finance team would get their own dbt project. Since both should be based on the same Raw Data Vault, an additional foundational dbt project is created exclusively for staging and Raw Data Vault objects. Both domain-specific dbt projects, sales and finance, can now refer to Raw Vault objects inside the foundational dbt project, avoiding actually physically replicating the data.

How can I leverage dbt Mesh in a Data Vault powered Data Mesh?

Define Data Contracts

Dbt models, or groups of models, can now be configured to have data contracts. Inside the already familiar .yml files, models can now be set to be publicly available (within an organization), data owners can be enforced, and table schemas can be locked.

Create a Foundational dbt project

In a Data Mesh architecture, the most common way to implement Data Vault 2.0, is to have a commonly shared Raw Vault as a foundation, and both Business Vault and Information Marts are divided by business domains. In dbt Mesh, this would reflect in a foundational dbt project, that includes all staging and Raw Data Vault objects. Only the Raw Data Vault objects would be configured to be accessible by other dbt projects, since the staging models should not be used outside of Raw Data Vault models.

Add domain-level dbt projects

Based on the foundational Raw Vault dbt project, each domain team can now work in their own dbt project. They access the Raw Data Vault via the (extended) {{ ref() }} function and don’t have to worry about maintaining these Raw Vault objects. Additionally, they can define which of their artifacts might be useful for other domains, these can be shared via their own data contracts.

Distribute Responsibilities

Typically, a power user does not create Hubs, Links, and Satellites. And it’s not their responsibility to ensure a reliable Raw Data Vault to build transformations on. Therefore, it is important to define responsibilities within each dbt project. Especially objects that are shared outside of one project should always have data contracts and defined owners. This ensures that users of these shared objects can rely on it.

Conclusion

All in all, dbt Mesh offers a fantastic way to properly implement a true data mesh approach. It is especially relevant, when different business domains of one organization are working together in dbt to create trustable deliverables. In most scenarios, it makes sense to already start using dbt Mesh, although your project might not be too big yet. Having clear responsibilities and data contracts always helps maintain trust and transparency for your data!

– Tim Kirschke (Scalefree)

Building a scalable Data Platform? In Data Vault Friday

Modelling Exchange Rates

Watch the Video

In our ongoing series, our CEO Michael Olschimke addresses a question from the audience about modelling daily exchange rates within the Data Vault framework for a non-banking industry. The query highlights a common challenge faced by many organizations: integrating and managing exchange rate data effectively.

The question posed was, “How would you model daily exchange rates in Data Vault 2.0 for a non-banking industry? We are already using a reference table for the list of currencies (I guess we would have currency as a hub in the banking industry, but that is not our case). Now we also need daily exchange rates for currency conversions in the datamart layer. I would start with a Link for exchange rates, but do we need to create a hub for currencies? How about existing references to currency in the existing model (currently in SAT, because we have currency as a reference table)?”

Michael’s response delves into the intricacies of data modelling in such scenarios. He suggests that even though your industry is non-banking, establishing a structured and scalable way to manage exchange rate data is crucial. Using a Link for exchange rates is a good starting point, but creating a hub for currencies could provide additional benefits. This hub would act as a central repository for all currency-related information, ensuring consistency and ease of access across different layers of the data architecture. Additionally, integrating existing references to currencies within the model can streamline operations and enhance the accuracy of financial data analytics.

In conclusion, Michael emphasizes the importance of a well-thought-out data architecture. By creating a dedicated hub for currencies and effectively linking exchange rate data, organizations can ensure more accurate and efficient currency conversions in their datamart layer. This approach not only aligns with best practices in data vault modeling but also supports the broader goal of maintaining data integrity and usability across the enterprise.

Meet the Speaker

Michael Olschimke

Julian Brunner In Data Vault Friday

How to Implement Data Quality Techniques

Watch the Video

In our latest video, BI Consultant Julian Brunner tackles a pressing query: “Where to implement data quality techniques? Is it possible to clean dirty data at the entry point into the raw data vault?” Data quality is foundational for informed decision-making, and Julian’s expertise shines as he navigates this critical terrain.

Julian highlights the importance of a holistic approach to data quality management, emphasizing the need for robust frameworks spanning the entire data lifecycle. Whether it’s validation rules, data profiling, or cleansing algorithms, proactive measures at every stage can fortify data integrity.

Julian Brunner In Beginner

Automating a Scalable Data Warehouse with Data Vault Builder

Watch the Webinar

Unlock the power of automation in your data warehouse with Data Vault Builder in our upcoming webinar. Dive into the intricacies of Data Vault 2.0 and discover why it’s tailor-made for automation, promising efficiency and scalability like never before. Whether you’re a seasoned data professional or just embarking on your data warehousing journey, this webinar offers invaluable insights into streamlining your processes and accelerating implementation.

During this joint webinar, you’ll delve into the core principles of Data Vault 2.0 and witness firsthand how Data Vault Builder revolutionizes the implementation process. Through a live demonstration, gain practical knowledge and actionable tips to optimize your data warehouse architecture. From overcoming common challenges to kickstarting your project with confidence, this session equips you with the tools and techniques needed to succeed in the world of data warehousing.

Don’t miss this opportunity to elevate your data warehousing game and leverage the full potential of automation with Data Vault Builder. Join us and discover how to transform your data infrastructure into a dynamic, scalable powerhouse. Whether you’re a data architect, analyst, or IT professional, this webinar promises to be a game-changer for your organization’s data strategy. Register now to secure your spot!

Watch Webinar Recording

Building a scalable Data Platform? In Data Vault Friday

Data Vault 2.0 Pre-Analysis aka Automation for the Poor

Watch the Video

In our ongoing series, our CEO Michael Olschimke discusses a question from the audience:

“In recent training from ScaleFree I saw a glimpse of an excel sheet that basically annotated data sources with data vault specific metadata. It had like plenty of Salesforce attributes in it together with annotations like: Business key, Link business key, Satellite descriptive attribute, etc.

Can you talk a bit about this metamodel? Can it be used to drive automated creation of the data vault structures?”

Michael stresses the irreplaceable role of pre-analysis in establishing a successful Data Vault 2.0 framework. Michael underscores the crucial nature of this stage, aligning our work with each client’s aspirations while staying true to the guiding wisdom of Dan Linstedt.

Meet the Speaker

Michael Olschimke

Hernan Revale In Data Tools

Exploring datavault4dbt: A Practical Series on the dbt Package for Data Vault 2.0 – Vol. 2: Standard Entities in the Raw Vault

Exploring datavault4dbt

In our initial post of this series, we delved into the creation of our staging layer using DataVault4dbt, an open-source package designed for Data Vault 2.0 within dbt. In this installment, we embark on the journey to construct our first standard Data Vault 2.0 model entities in the Raw Vault, including Hubs, Links, and Satellites. As in our previous post, we recommend staying up-to-date with the latest changes and adaptations in the DataVault4dbt package by referring to the project’s GitHub repository Wiki.

In this article:

Before We Start
- stg_orders
A. Standard Hub with datavault4dbt
- order_h
B. Standard Link with datavault4dbt
- order_customer_1
C. Standard Satellite Version 0 with datavault4dbt
- order_0s
D. Standard Satellite Version 1 with datavault4dbt
- order_s
Conclusion

Before We Start

Before we get started, ensure that you have the DataVault4dbt package correctly installed in your packages.yml file and that you’ve executed dbt deps.

For this tutorial, we’ll be using the TPCH Snowflake Sample Data. Moreover, we assume you’ve already established your staging model, which includes the calculation of hashkeys and hashdiffs. Here’s a snippet from our staging model‘s configuration, which we’ll need later when creating the Raw Vault entities:

stg_orders

A. Standard Hub with datavault4dbt

Hubs are constructed based on a unique list of business keys, making their configuration relatively straightforward. In this example, we’ll be creating the Hub for orders:

order_h

hashkey: the hashkey name in the staging model
business_keys: name of the business key used as input for the previously mentioned hashkey
source_models: name of our staging model

B. Standard Link with datavault4dbt

Link models establish connections between business keys. In our case, we’ll create a connection between the previously formed Order Hub and the Customer Hub:

order_customer_1

link_hashkey: hashkey of the Link, generated using the foreign keys from the Hubs in the staging layer
foreign_haskeys: a list of foreign hashkeys to be included in our link
source_models: name of our staging model

C. Standard Satellite Version 0 with datavault4dbt

Following Data Vault 2.0 standards, Version 0 Satellites are created as incremental tables. In our example, the Satellite will be connected to the previously generated Order Hub:

order_0s

parent_hashkey: name of the parent entity’s hashkey, in our case, the Order Hub
src_hashdiff: hashdiff already calculated on the staging model
src_payload: original columns used in the hashdiff calculation
source_model: name of our staging model

D. Standard Satellite Version 1 with datavault4dbt

Additionally, the Version 1 Satellite is a virtually generated entity created on top of our Version 0 Satellite. Beyond the materialization type, the main difference with the V0 Satellite is the introduction of a new column for calculating the load end date. The load end date will be useful for us downstream when dealing with PIT tables in the Business Vault.

order_s

sat_v0: name of the related Version 0 Satellite
hashkey: hashkey name of the parent entity, in our case, the order Hub
hashdiff: hashdiff already calculated on the staging model
ledts_alias: name of the load end date column to be generated
add_is_current_flag: when true, it generates a new column flagging the last loaded rows based on the load end date

Conclusion

In this journey through the creation of Raw Vault standard entities, we’ve established a strong foundation for our Data Vault 2.0 architecture. By utilizing DataVault4dbt within dbt, we’ve simplified the development of Hubs, Links, and Satellites. These fundamental building blocks are the cornerstone of a robust and scalable data warehousing solution. As we progress in this series, we’ll continue to explore advanced concepts and delve into the intricacies of Data Vault modeling, preparing us to unlock the full potential of our data.

Building a scalable Data Platform? In Data Vault Friday

Data Vault PITs in PowerBI – Joining Type 2 Dimensions

Watch the Video

In our ongoing series, our CEO Michael Olschimke discusses a question from the audience:

“We would like either build a semantic model or let end users build it themselves as star schemas.

In Infomarts we expose facts and dimensions.

Dimensions are based on pits and expose all contextual attributes valid to a given snapshot data.

Now the problem is that facts and dimenions need to be joined not only by main keys of dimension, but also a snapshot data. However, PBI allows only joins with 1 attribute.

What is a recommended way to tackle this?

I thought of introducing sequence numbers in PITs and exposing them in virtualized fact views, additionally exposing separate snapshot dimension that synchronizes snapshots of all the dims (otherwise we end up in cartesian join). However this defeats partitioning in the PITs (join over sequence number and not hashKey + SnapshotDate blocks partition pruning).”

Michael delves into an in-depth discussion on leveraging Data Vault’s Point-in-Time (PIT) tables within PowerBI, exploring how this integration enhances analytical capabilities and supports dynamic reporting in the realm of Big Data.

Meet the Speaker

Michael Olschimke

Hernan Revale In Beginner, Data Vault

Designing the Business Vault: Key Strategies for Effective Data Organization

Designing the Business Vault

Data Vault 2.0 has emerged as a comprehensive framework, offering agility, scalability, and adaptability. At the heart of this framework lies the Business Vault, a critical component for effective data organization and analysis in modern enterprises.

In this article, we will check the key principles and strategies for designing a robust Business Vault within the context of Data Vault 2.0.

Designing the Business Vault: Key Strategies for Effective Data Organization

Join us for an insightful webinar on “How to design the Business Vault?” as we explore the critical role of the Business Vault within the Data Vault 2.0 framework. Discover how the Business Vault serves as a pivotal component for translating raw data into actionable insights, applying soft business rules to streamline end-user structure creation, and ensuring an efficient population of Information Marts.

Watch webinar recording

In this article:

Understanding Data Vault 2.0
Importance of Business Vault
Key Concepts of a Business Vault
Conclusion

Understanding Data Vault 2.0

Data Vault 2.0 represents a paradigm shift in data architecture, distinguishing itself from traditional warehousing methods. Its flexibility and scalability make it ideal for organizations navigating the complexities of modern data ecosystems.

Data Vault 2.0 architecture follows a multi-layer approach, consisting of the Staging Layer, the Enterprise Data Warehouse Layer, and the Information Marts Layer. By dividing our data architecture into multiple layers, we can respond to both the needs of the technical teams (i.e., historization, auditability, and data integration) and the requirements of the business users (i.e., quick access to relevant, well-organized information). This integrated approach ensures a harmonious synergy between technical and business objectives.

To achieve all these goals, Data Vault 2.0 proposes a subdivision inside the Enterprise Data Warehouse Layer: the Raw Vault and the Business Vault. The Raw Vault will receive and integrate the unaltered data from the source, while the Business Vault will translate the raw data into meaningful insights for informed decision-making.

Importance of Business Vault

The Business Vault serves as a middle ground between the Raw Vault and the Information Mart layers. It is an optional vault, sparsely generated on top of the Raw Vault and normally it is virtualized. Differently from the Raw Vault, in the Business Vault, we will be applying soft business rules, i.e., those rules that change the data.

This layer will be created to serve the business in different ways, such as the generation of query assistance entities or by precomputing calculated fields that later will be used on downstream layers. In other words, the Business Vault will host business-rule changed data and its purpose is to ease the creation of end-user structures.

Key Concepts of a Business Vault

A Business Vault will be modeled following the Data Vault 2.0 design principles. Nevertheless, it won’t necessarily follow the strict auditability requirements of the Raw Vault, as we can drop and recreate the Business Vault entities at any time. With the purpose of serving to populate the Information Mart more easily and efficiently, the entities will be created only if they are necessary for the business. This is also why the Business Vault usually keeps reusable business logic.

The types of entities we can typically find inside a Business Vault could be Point-In-Time (PIT) and Bridge Tables, for query assistance; Computed Satellites or Links, for storing computed data; and Exploration Links, for connecting Hubs that were not previously connected to the Raw Vault. Besides, any other entities that are created on top of the Raw Vault, using business logic and queried by the Information Marts layer, would belong to the Business Vault. For instance, we might need business logic to map instances of the same thing, thus creating a Business Same-as Link.

Conclusion

In data management, Data Vault 2.0 encompasses different aspects such as data modeling, methodology, and architecture. Distinguished by its versatility, this framework places a significant emphasis on agility and adaptability. In this sense, at the core of Data Vault 2.0 architecture lies a pivotal concept, the Business Vault, a key player for efficient data organization and analysis in modern enterprises.

The Business Vault, a flexible optional layer, interprets raw data into actionable insights, applying soft business rules. Its purpose is to streamline end-user structure creation by hosting processed data. Entities are created selectively, keeping reusable business logic. In essence, the Business Vault ensures the efficient population of Information Marts by focusing on business-critical data.

Interested in more? Watch the webinar recording here for free!

Watch the Video

Exploring Data Vault 2.0: Managing Hashing Costs in Smaller Environments

The Role of Hashing in Data Vault 2.0

Challenges of Hashing

Evaluating Hashing Alternatives

The Case Against Sequences

Hash Keys vs. Business Keys

Hash Keys

Business Keys

Performance Optimization Strategies for Hashing

Future Trends and Recommendations

Conclusion

Meet the Speaker

Watch the Video

Meet the Speaker

Watch the Webinar

Watch the Video

dbt Mesh

Data Mesh Support bei dbt Cloud

What is dbt Mesh?

Why would I want to refer to other dbt projects?

How can I leverage dbt Mesh in a Data Vault powered Data Mesh?

Define Data Contracts

Create a Foundational dbt project

Add domain-level dbt projects

Distribute Responsibilities

Conclusion

Watch the Video

Meet the Speaker

Watch the Video

Watch the Webinar

Watch the Video

Meet the Speaker

Exploring datavault4dbt

Before We Start

stg_orders

A. Standard Hub with datavault4dbt

order_h

B. Standard Link with datavault4dbt

order_customer_1

C. Standard Satellite Version 0 with datavault4dbt

order_0s

D. Standard Satellite Version 1 with datavault4dbt

order_s

Conclusion

Watch the Video

Meet the Speaker

Designing the Business Vault

Designing the Business Vault: Key Strategies for Effective Data Organization

Understanding Data Vault 2.0

Importance of Business Vault

Key Concepts of a Business Vault

Conclusion

Build Better Data Platforms

SOLUTIONS

TRAINING

EVENTS

KNOWLEDGE HUB

CAREERS

COMPANY