Skip to main content
search
0
Scalefree Knowledge Webinars Data Vault Friday How to Define SCD Type 2 Dimension Keys in a Data Vault Solution

SCD Type 2 Dimension Keys in Data Vault: Hash Keys, Sequences, and the PIT Table

Defining dimension keys in a Data Vault solution is one of those topics that seems straightforward until you get to Type 2 dimensions — and then the options multiply quickly. Should you use hash keys or sequences? Where do Type 2 keys come from, and how do they connect back to your facts? This post walks through the full picture, from the simplest Type 1 case all the way to the Dimension Hash Key pattern used for Type 2 slowly changing dimensions.



SCD Type 2 Dimension Keys: Starting with the Simple Case

For Type 0 and Type 1 dimensions — dimensions without history — the dimension key question is easy. Every Hub already contains exactly one hash key per business entity, and every Link contains one hash key per relationship. These Type 1 hash keys are already present throughout your model: in Non-Historized Links, Dependent Child Links, and Bridge Tables. You can use them directly as dimension keys in your view layer without generating anything new. It’s the lowest-effort, highest-compatibility option.

Hash keys also have a significant advantage over sequences in distributed environments. If your facts live in the cloud and your dimensions are generated on-premise, you can’t easily synchronize integer sequences between systems — the lookup dependencies alone make it impractical. Hash keys don’t have this problem. Hashing the same business key on two different systems produces the same hash value. A distributed Information Mart works cleanly with hash keys; with sequences, it becomes a coordination problem.

For more on how hash keys work in Data Vault and why they’re designed the way they are, the Scalefree blog covers the topic in depth.

When Sequences Make Sense — and How to Generate Them

The case for sequences is primarily storage. An MD5 hash value stored as a character string takes 32 bytes; a SHA-1 takes 40. A big integer takes 8 bytes. If storage is a genuine concern, converting character-based hash values to binary in the view layer is the first option to consider — it cuts the size in half with minimal effort and no structural changes.

If you still want integer sequences after that, there are two places to generate them. You can add a sequence column directly to the Hub or Link structure, used purely as a downstream dimension key rather than as an identifier. This works but creates a conceptual tension: after spending effort explaining why sequences aren’t used as Hub identifiers, reintroducing them in the same structure is confusing for anyone reading the model.

The cleaner approach is a Computed Satellite in the Business Vault, attached to the Hub or Link, that generates a new sequence value for every new record in the parent. It’s a simple business rule — new parent record, new sequence — and it keeps the sequence generation in the layer designed for computed values. The trade-off is an additional join when consuming the sequence downstream, but the design is explicit and the logic is easy to understand and maintain.

The Data Vault Handbook:
Core Concepts and Modern Applications

Build Your Path to a Scalable and Resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

The Type 2 Challenge: Why Hub Hash Keys Aren’t Enough

Type 1 hash keys work for dimensions without history because the granularity is one row per business entity. Type 2 dimensions need finer granularity — one row per business entity per version over time. The hash key from the Hub doesn’t capture that; it’s the same value regardless of when you’re looking at the data.

What you need for a Type 2 dimension is a key that is unique not just per entity but per entity per point in time. In Data Vault, that key already exists — it’s generated as part of the PIT Table.

The Dimension Hash Key from the PIT Table

When producing a Type 2 dimension, you need a PIT Table anyway — it provides the snapshot-based granularity that drives the dimension’s history. The PIT Table’s alternate key is the combination of the parent’s business key (not the hash key — never hash a hash) and the snapshot date. The primary key of the PIT Table is a hash value computed from those two inputs: business key plus snapshot date.

At Scalefree, this value is called the Dimension Hash Key. It is unique per row in the PIT Table, which means it is unique per entity per point in time — exactly what a Type 2 dimension key needs to be. This Dimension Hash Key becomes the primary key of your Type 2 dimension and the foreign key that your fact entities need to reference in order to join to the correct dimension member at the correct point in time.

Connecting Facts to Type 2 Dimensions

The remaining challenge is on the fact side. Bridge Tables and Non-Historized Links — the typical foundations for fact entities — contain Type 1 hash keys from Hubs and Links, not Type 2 Dimension Hash Keys. So how does a fact row know which Type 2 dimension member to reference?

The solution is a join through the PIT Table’s alternate key inside the fact view. A Bridge Table typically contains the Type 1 hash key from the relevant Hub and a snapshot date. Those two values together form the alternate key of the PIT Table. Inside the fact view, you join the Bridge Table to the PIT Table using the hash key and snapshot date, retrieve the Dimension Hash Key from the PIT Table’s primary key, and surface that as the dimension reference in the fact entity.

The result: the fact entity contains a single column — the Dimension Hash Key — that points to exactly one Type 2 dimension member. The dashboard tool and end users never need to know how it was derived. The join logic is handled in the view layer, the keys match between fact and dimension, and the relationship resolves cleanly. This is the preferred approach rather than exposing a composite key (hash key plus snapshot date) from the fact side, which would complicate the dimensional model unnecessarily.

For teams using datavault4dbt premium, PIT Table generation and the Dimension Hash Key pattern are handled through the automation framework, which significantly reduces the manual effort involved in implementing this correctly at scale.

The Data Vault Handbook:
Core Concepts and Modern Applications

Build Your Path to a Scalable and Resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

Putting It Together: Key Decisions for Dimension Keys

To summarize the decision framework: for Type 0 and Type 1 dimensions, use the Type 1 hash keys from Hubs and Links directly — they’re already available throughout the model and work cleanly in distributed environments. If storage is a concern, convert to binary hash values in the view layer before considering sequences. If sequences are genuinely required, generate them in a Computed Satellite in the Business Vault rather than embedding them in Hub or Link structures.

For Type 2 dimensions, use the Dimension Hash Key from the PIT Table as the primary key of the dimension. Connect facts to Type 2 dimensions by joining the Bridge Table or Link to the PIT Table’s alternate key inside the fact view, surfacing the Dimension Hash Key as the dimension reference. This keeps the dimensional model clean, the keys stable, and the join logic encapsulated where it belongs.

To go deeper on PIT Tables, dimension modeling, and the full Data Vault delivery layer, explore our Data Vault certification program. And for a concise introduction to the core concepts, the Data Vault Handbook is available as a free physical copy or ebook.

Watch the Video

Leave a Reply

Close Menu