Skip to main content
search
0
Scalefree Data Consulting Solutions What is Data Vault 2

What is Data Vault 2

Data Vault 2 is a business intelligence system that includes the components needed to realize your enterprise vision for data integration and information delivery. As a hybrid architecture, it combines the best aspects of third normal form and a star schema, resulting in a historical-tracking, detail-oriented, uniquely linked set of normalized tables.

Crucially, this robust foundation transforms your infrastructure into an AI-ready data platform. By ensuring that advanced analytics and machine learning models are fueled by highly reliable, fully auditable data, Data Vault 2 leverages original modeling concepts into a modern environment, enabling outstanding scalability, flexibility, and consistency. It is an open standard based on three major pillars: Architecture, Modeling, and Methodology.

Data Vault pillars: Architecture, Modeling, and Methodology

General Definitions

Before diving into how Data Vault 2 works, it is helpful to understand a few foundational concepts that are essential to the framework. For a complete, guided introduction, our Udemy course “Data Vault: An Introduction by Michael Olschimke” is a great place to start.

Business Keys

These are unique identifiers of relevant business objects (such as a customer, product, or order). They establish a common language across your enterprise, allowing different systems to integrate seamlessly. A good business key has a low propensity to change and is universally recognized.

Rules

Data Vault strictly separates rules into two types. Hard Rules are technical adjustments that adapt data for the platform without altering its fundamental meaning. Soft Rules (or business rules) transform the data to serve end-user requirements and business logic.

1. Data Vault Architecture

The Data Vault 2 architecture is built upon a highly scalable three-layer structure. This approach separates the complexities of data integration and historical tracking from the delivery of actionable insights to end users.

Data Vault Architecture

Staging Area

The initial layer where raw, unmodified data is collected from various source systems. Its primary purpose is to offload the burden from operational systems and prepare the data for the warehouse using only Hard Rules.

Integration Layer

This is the core of the platform, designed to create a single point of facts. It is divided into two parts:

  • Raw Data Vault: Stores data in its original, unmodified format, ensuring full historical context and strict auditability.
  • Business Vault: An intermediary layer where Soft Rules and calculations are applied to bridge the gap between raw data and end-user needs.

Information Delivery

The user-facing layer that transforms data into accessible formats like star schemas or flat tables. It delivers data via specialized marts tailored to specific needs, for instance:

  • Information Mart: Focuses on business metrics and KPIs.
  • Error Mart: Tracks data inconsistencies that violate hard rules.
  • AI Mart: A feature store derived from multiple data sources, used directly by data scientists to train machine learning and AI models.

2. Data Vault Modeling

Data Vault modeling is designed to explicitly meet the needs of today’s enterprise data platforms, managing big data and high velocity with ease. The model is based on three core entities that represent a business from its fundamental components:

Data Vault model core entities

Hubs

Represent core business concepts through a distinct list of business keys (e.g., a “Customer” or “Flight” Hub).

Serve as the connectors that illustrate how different Hubs (business concepts) interact and relate to one another (e.g., a patient visiting a doctor).

Satellites

Store the descriptive attributes and historical context associated with Hubs or Links. Every time an attribute changes, the Satellite logs it, creating a complete historical timeline.

To illustrate how these core entities work together, consider an aviation scenario:

The central business elements -such as airports, carriers, and flights- are modeled as distinct Hubs (HubAirport, HubCarrier, and HubFlight). To represent a complete flight and establish the relationship between these independent objects, a Link (LinkFlight) connects all three of these Hubs together. Finally, Satellites are attached to provide the necessary context and descriptive data. For instance, HubAirport has multiple Satellites (SatAirportLocation and SatAirportTZ) to separate and historically track different types of airport attributes.

Wondering how to model your own complex business scenarios? Bring your questions to Data Vault Friday, our live weekly Q&A webinar.

To ensure lightning-fast reporting and analytics, the Business Vault also utilizes Query Assistant Tables, such as Point-in-Time (PIT) and Bridge tables. These system-generated entities act as indexes to simplify complex data joins and drastically accelerate query performance for end-users.

The Image shows a Data Vault 2.0 Modeling example, which is described more closely in the text.

3. Data Vault Methodology

Data Vault 2 incorporates core engineering standards like Capability Maturity Model Integration (CMMI), Total Quality Management (TQM), and Six Sigma to ensure consistent, repeatable, and high-quality deployments.

However, modern Data Vault 2 environments have evolved to embrace highly agile, value-driven ways of working:

Disciplined Agile

An evolutionary approach that balances quality, speed, and adaptability. It encourages self-organizing teams to work iteratively, addressing technical debt at its root cause and delivering functional solutions quickly.

Product Mindset

Instead of viewing a data platform as a finite “project” with an end date, Data Vault 2 methodology encourages a product-based approach. The data platform is treated as an ongoing, evolving product that delivers continuous, incremental value to the business, ensuring it scales seamlessly alongside your organization’s needs.

Want to master these standards for your own team? Explore our Data Vault 2.1 Training & Certification.

The Data Vault Handbook:
Core Concepts and Modern Applications

Build Your Path to a Scalable and Resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free
Marc Winkelmann - Data Vault Trainer

Marc Winkelmann
Managing Consultant

Phone: +49 511 87989342
Mobile: +49 151 22413517

Close Menu