As a database developer, administrator or user, you probably have already heard of Data Vault. But with an existing database solution in mind, you might come to the question of why you should use Data Vault in your enterprise data warehouse. This blog article discusses some features where Data Vault can shine and why you should use it.
Reason 1: Supporting multiple source systems and frequently changing relationships.
Data Vault 2.0 provides the most benefits when your data comes from many source systems or has constantly changing relationships.
Data Vault 2.0 works well for multi-source systems or those that have constantly changing relationships.
The reason why it works well for these systems is its ability to make adding attributes simple. If there is a change to one source system, that change does not need to show up within all source systems. Similarly, you can limit the number of places changes are made, as attributes are stored separately from structural data in satellites.
Additionally, it is easier to account for new and changing relationships by closing off one link and creating another. You don’t have to change the historical data to account for a new relationship or update an existing schema. You only need to account for the changes going forward. This brings enormous flexibility and scalability into your enterprise data warehouse.
Reason 2: Easily track and audit your data.
In a modern data environment, the data runs through various different stops. To still provide continuous data quality, it must always be clear where data has come from.
Figure 1: Data Lineage
Given the need to be able to easily track and audit data, Data Vault has made it easier to do so. It does so by inherently enabling auditing, as load times and record sources are required for every row. It also tracks all historical changes as satellites, including the load time as part of the primary key.
When an attribute is updated, a new record is created.
All of this auditing enables you to easily provide auditability for both regulatory and data governance purposes. And because you store all of your history, you can access data from any point in time.
Reason 3: Load data from multiple source systems more quickly.
Data Vault enables quicker data loading simply because a number of tables can be loaded at the same time in parallel. The model decreases dependencies between tables during the load process and simplifies the ingestion process by leveraging inserts only, which load quicker than upserts or merges. This also leads to less complexity.
Data Vault 2.0 is designed to integrate data from several source systems and may over perform in some situations.
In summary, if you have a small to medium-sized analytics requirement, with a small team of architects, designers, and engineers delivering a solution with data sourced from a few systems, then Data Vault may be inappropriate for your need.
If, however, you have a large project with many source systems leading to an enormous data integration challenge, then Data Vault can potentially add massive value to the project.
Get Updates and Support
Please send inquiries and feature requests to [email protected].
For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.