Patterns for Deriving Data Vault Dimensions and Facts
If you’ve gone through a Data Vault training, you know that Data Vault is a pattern-driven methodology — from loading the Raw Data Vault all the way through business logic implementation in the Business Vault. But one question that consistently comes up, especially among practitioners bridging the gap between modeling and delivery, is this: How do you actually derive Dimensions and Facts from a Data Vault model?
It’s a great question, and the good news is that just like the rest of Data Vault, there are clear, repeatable patterns for doing it well. Let’s walk through them.
In this article:
- The Pattern-Based Nature of Data Vault Delivery
- Deriving Dimensions: The 80/20 Rule
- Why PIT Tables? The Case for Point-In-Time
- Deriving Facts: Links, Non-Historized Links, and Bridge Tables
- Grain Management: The Core Challenge of Fact Derivation
- PIT Tables for Snapshot Facts
- Beyond Dimensions and Facts: Wider Applicability
- Summary: The Patterns at a Glance
- Learn the Patterns, Apply Them Everywhere
- Watch the Video
The Pattern-Based Nature of Data Vault Delivery
Data Vault is not just a modeling technique — it’s an entire methodology built around patterns. There are patterns for loading the Raw Data Vault, patterns for modeling it, and patterns for implementing business logic in the Business Vault. The same is true for the information delivery layer, where you produce standard target entities like Dimensions and Facts (or 3NF entities, flat/wide schemas, and more).
Once you internalize these patterns, deriving your target entities becomes a predictable and even automatable process. That’s one of the most underappreciated strengths of Data Vault: the patterns repeat, and the more you use them, the faster and more confident you become.
Deriving Dimensions: The 80/20 Rule
Roughly 80% of your Dimensions will follow one core pattern: they are derived from a combination of a Hub, one or more Satellites, and — critically — a PIT Table (Point-In-Time Table).
The basic query pattern looks like this:
- Start with the PIT Table as your base
- Join the Hub to retrieve the business key
- Join the relevant Satellites for descriptive attributes
- Cherry-pick the attributes you need for your Dimension
The remaining 20% of Dimensions come from Links rather than Hubs. In these cases, you start with a PIT Table built on a Link, join the Link itself for hash keys and Dimension references (often called Bridge Dimensions), and then join the Satellites to describe the relationship.
In practice, this covers close to 99% of all Dimension derivations. There are always edge cases, but this two-pattern approach handles the vast majority of real-world scenarios.
The Data Vault Handbook:
Core Concepts and Modern Applications
Build Your Path to a Scalable and Resilient Data Platform
The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.
Why PIT Tables? The Case for Point-In-Time
If you’re new to PIT Tables, they can look intimidating at first — a structure full of hash keys and timestamps. But once you understand the pattern, it’s actually one of the most elegant tools in the Data Vault toolkit. And here’s the key insight: in modern implementations, PIT Tables are generated by automation tooling, so you rarely build them by hand. You learn how to use them, not how to construct them from scratch.
PIT Tables are essential for several reasons:
Supporting All SCD Types
A PIT Table provides the foundation for Slowly Changing Dimension (SCD) types 0, 1, and 2 — all from the same structure. The snapshot day timestamp embedded in the PIT Table allows you to control the grain and timeline of your Dimension flexibly.
Delivering Stable Reports
Here’s a scenario every analyst will recognize: you pull a report at 9:00 AM and expect it to remain consistent throughout the day. But if your Dimension view is simply joining the latest delta from a Satellite directly, any incoming data update after 9:00 AM will silently change your report’s results.
A PIT Table solves this by identifying the latest valid delta for all hash keys at a specific snapshot timestamp — say, 9:00 AM. This decouples your information delivery from live, incoming data pipelines. Users get stable, predictable reports. The data warehouse and the source pipelines can operate independently without stepping on each other.
This is why using a PIT Table, though it may appear more complex at first, is actually simpler and more aligned with what business users actually need: consistent data at a known point in time, not real-time volatility bleeding into their dashboards.
Decoupling Delivery from Ingestion
Beyond stability, the PIT Table serves a broader architectural purpose: it creates a clean separation between the data loading process and the data delivery process. This is a hallmark of mature data warehouse design — and Data Vault makes it a first-class pattern rather than an afterthought.
Want to deepen your understanding of PIT Tables, Satellites, and the complete Data Vault modeling toolkit? Explore our Data Vault 2.1 Training & Certification to get hands-on with these patterns.
Deriving Facts: Links, Non-Historized Links, and Bridge Tables
Facts are a different story. Where Dimensions describe who and what, Facts capture events, transactions, and messages — think banking transactions, call records, IoT sensor readings, or logistics movements.
For Facts, your primary sources in a Data Vault model are:
- Non-Historized Links (NH Links) — ideal for transactional events that don’t change
- Dependent Child Links — for subordinate, granular transactional data
- Bridge Tables — for pre-aggregated or grain-shifted data
- Joined Link combinations — for cross-domain Fact derivations
If your underlying Link structure already matches the grain you need for your Fact, you can build a simple view on top of it and you’re done. Clean and efficient.
Grain Management: The Core Challenge of Fact Derivation
Things get more interesting when your source data doesn’t naturally sit at the right grain for your target Fact. This is where Bridge Tables and grain shift operations come into play.
Coarsening Grain with GROUP BY
Imagine an NH Link that captures real-time vehicle positions — five messages per second. Your Fact table only needs one position per hour. You need to reduce the granularity.
You do this with a GROUP BY operation on a subset of the Hub references within the Link. The fewer Hub references you group by, the coarser your grain. The result of this aggregation is materialized into a Bridge Table, which then becomes the base for your Fact view.
Refining Grain by Joining Links
The inverse is also possible. If you need a finer grain than any single Link provides, you join multiple Links on their shared Hub references. For example, if one Link connects Customer and Store, and another connects Store and Product, joining them on the shared Store reference produces a result with three Hub references — a finer granularity than either Link alone.
Other Grain Shift Operations
Beyond GROUP BY and joins, there are other grain-shifting operations worth knowing:
- DISTINCT operations — to remove duplicate rows at a desired level of aggregation
- JSON array expansion — splitting a JSON array into multiple rows is itself a grain shift, making rows finer
All of these operations can produce results that are materialized in Bridge Tables for performance and reuse, which then serve as the clean input layer for your Fact views.
The Data Vault Handbook:
Core Concepts and Modern Applications
Build Your Path to a Scalable and Resilient Data Platform
The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.
PIT Tables for Snapshot Facts
There is one important exception to the “Links and Bridge Tables for Facts” rule: snapshot-based Facts. For periodic snapshots — daily inventory levels, end-of-month balances, weekly pipeline status — a PIT Table can serve as an excellent base for your Fact entity.
The catch is that PIT Tables are sometimes pruned over time to keep them manageable. If you need to preserve a long history of snapshots for your Fact, consider using a dedicated Bridge Table to maintain those snapshots independently from your operational PIT Tables. This gives you full control over retention and history without disrupting the PIT Table’s primary purpose.
Beyond Dimensions and Facts: Wider Applicability
The same patterns extend naturally to other target model types:
- 3NF (Third Normal Form) entities — derived using similar logic from Hubs, Links, and Satellites
- Flat and wide schemas — built by joining PIT Tables to Bridge Tables and then joining Satellites to pre-join all descriptive attributes into a single denormalized entity
- Fully denormalized Fact entities — start with a Bridge Table or Link, join PITs, then join the relevant Satellites to fold all Dimension attributes directly into the Fact
This last pattern — the fully denormalized Fact — is particularly powerful for downstream consumers like machine learning pipelines or self-service analytics tools that benefit from wide, flat tables. And it follows the exact same building blocks.
Summary: The Patterns at a Glance
| Target Entity | Primary Source | Pattern |
|---|---|---|
| Standard Dimension (80%) | PIT + Hub + Satellites | Join PIT → Hub → Satellites, select attributes |
| Bridge/Relationship Dimension (20%) | PIT on Link + Satellites | Join PIT → Link → Satellites |
| Transactional Fact | NH Links / Dependent Child Links | View on top of Link at correct grain |
| Aggregated Fact | Bridge Table (GROUP BY result) | Grain shift → materialize → view |
| Snapshot Fact | PIT Table or Bridge Table | Snapshot timestamps → Fact view |
| Denormalized Fact | Bridge + PIT + Satellites | Pre-join all descriptive attributes |
Learn the Patterns, Apply Them Everywhere
One of the most satisfying aspects of working with Data Vault is that once you’ve internalized these patterns, they compound. Each new project reinforces the same building blocks: Hubs, Links, Satellites, PIT Tables, Bridge Tables. The same logic for deriving a Dimension applies to deriving a 3NF entity. The same grain management principles for Facts carry over to denormalized wide tables.
Data Vault doesn’t ask you to reinvent the wheel for every project — it gives you a set of reliable, tested patterns and asks you to apply them consistently.
If you’re ready to master these patterns in depth — including hands-on work with PIT Tables, Bridge Tables, and full information delivery models — check out our Data Vault 2.1 Certification and Training. And if you’re just getting started, there’s a free Data Vault handbook available as a concise primer covering the core concepts, pros and cons, and when to use Data Vault in your architecture.
The patterns are there. Once you learn them, they’ll follow you everywhere.
The Data Vault Handbook:
Core Concepts and Modern Applications
Build Your Path to a Scalable and Resilient Data Platform
The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.