Skip to main content
search
0
Scalefree Knowledge Webinars Data Vault Friday How to Model Multi-Type Business Keys in Data Vault

Multi-Type Business Keys

One of the more subtle challenges in Data Vault modeling appears when a single source delivers multiple types of business keys in one structure. Things get even more complex when those keys:

  • Point to different business objects depending on a type indicator
  • Have different formats than the “real” business keys used in existing hubs
  • Require cleansing or transformation before they can be linked correctly

In this article, we’ll walk through how to model such a scenario in Data Vault, how to think about Links and key types, and—most importantly—where business logic and transformations should live.



The Problem Statement

Let’s restate the scenario in simple terms.

Your source table looks something like this:

| ItemBK | KeyType | KeyValue |
|--------|---------|----------|
| ITEM1  | A       | T123     |
| ITEM1  | B       | 999-45   |

From a business perspective:

  • ItemBK identifies an item (one clear Hub candidate)
  • KeyType determines which business object the key refers to
  • KeyValue contains the actual identifier

The complication comes from the target data model:

  • For KeyType A, the key belongs to Hub A, but the real business key is 123, not T123
  • For KeyType B, the key belongs to Hub B and can be used as-is

So the value T123 must eventually link to Hub A using the cleaned value 123.

This raises three classic Data Vault questions:

  • How should this be modeled from staging to the Raw Data Vault?
  • Should the key type be stored in the Link?
  • Where should key cleansing and transformation happen?

First Principle: Preserve the Source

Before jumping into modeling options, let’s clarify a core Data Vault principle:

The Raw Data Vault must always allow you to recreate the original source delivery.

This means:

  • No irreversible transformations in staging
  • No hidden business rules applied too early
  • Full auditability back to the source

Every modeling decision below is guided by that principle.

Identifying the Core Business Objects

From the example, we can already identify several business concepts:

  • Item → clearly a Hub (Hub Item)
  • KeyValue → a business key, but its meaning depends on KeyType
  • Relationships → Item is related to different business objects depending on KeyType

This is a common pattern: a single source column contains references to different hubs.

Modeling Option 1: Pivot the Source (When Key Types Are Fixed)

If—and this is a big if—you can guarantee that:

  • The number of key types is small
  • The set of key types is stable (e.g. only A and B)

Then a pivoted staging approach can work.

What This Looks Like

You pivot KeyType into separate columns:

| ItemBK | KeyValue_A | KeyValue_B |
|--------|------------|------------|
| ITEM1  | T123       | NULL       |
| ITEM1  | NULL       | 999-45     |

This allows you to:

  • Load Hub Item from ItemBK
  • Load Hub A from KeyValue_A
  • Load Hub B from KeyValue_B
  • Create Links between Item and each Hub

Limitations

This approach breaks down quickly when:

  • A new key type appears
  • Key types are dynamic or unknown
  • The source evolves frequently

In those cases, pivoting creates a brittle staging layer and forces frequent refactoring.

Modeling Option 2: Generic Hub for Typed Keys

When key types are dynamic or extensible, a more robust pattern is required.

In this case, treat the combination of KeyType + KeyValue as a business key.

Create a Generic “Key” Hub

This hub represents “external identifiers” regardless of their semantic meaning.

  • Hub Key
    • Business Key = KeyType + KeyValue

Why composite?

  • Key values may overlap across types
  • Only the combination uniquely identifies the business object

You then:

  • Load Hub Item from ItemBK
  • Load Hub Key from (KeyType, KeyValue)
  • Create a Link between Item and Hub Key

This guarantees:

  • Full preservation of the source
  • No assumptions about future key types
  • Clean separation of concerns

But What About the Existing Hubs (Hub A, Hub B)?

This is where many teams are tempted to inject business logic too early.

Yes, you already have Hub A and Hub B. But the mapping logic that says:

  • “If KeyType = A, this belongs to Hub A”
  • “If KeyType = B, this belongs to Hub B”

is business logic.

Business logic does not belong in the Raw Data Vault.

The Correct Place: Business Vault

In the Business Vault, you can:

  • Create derived business hubs
  • Filter by KeyType
  • Union keys from different raw sources

For example:

  • Business Hub A = existing Hub A UNION keys from Hub Key where KeyType = ‘A’

This approach ensures that:

  • Raw data remains unchanged and auditable
  • Business rules are explicit and versionable
  • Changes in logic don’t force Raw Vault reloads

The answer depends on the modeling option.

Option 1 (Pivoted Source)

No.

Once pivoted, the key type is implicit in the structure. There is no longer a “KeyType” concept in the Link.

Option 2 (Generic Key Hub)

Again, no.

The key type becomes part of the Hub business key. The Link simply connects Item to the generic Key Hub.

If—and only if—key values are guaranteed to be unique across types, an alternative is:

  • Store KeyType in a Satellite on the Hub

But in most real-world cases, using a composite business key is safer.

Where Should Key Cleansing and Transformation Happen?

This is the most important question.

In the example:

  • T123 must become 123 to link to Hub A

This is a business rule.

What Not to Do

  • Do not strip the “T” in staging
  • Do not load cleaned values directly into the Raw Hub

Why?

  • The rule may change
  • You lose the original source value
  • You break auditability

Instead, load the raw value T123 as-is and create a Same-As Link.

This link connects:

  • The “raw” business key (T123)
  • The “master” business key (123)

Benefits:

  • Full traceability
  • Explicit business logic
  • Flexible remapping if rules change

Same-As Links are purpose-built for this kind of semantic deduplication.

Putting It All Together

A robust end-to-end solution looks like this:

  • Stage data exactly as delivered
  • Load ItemBK into Hub Item
  • Load (KeyType + KeyValue) into a generic Hub
  • Link Item to the generic Hub
  • Apply business logic in the Business Vault
  • Use Same-As Links to handle key transformations

Key Takeaways

  • Multi-type business keys are common—and tricky
  • Preserve the source structure in the Raw Data Vault
  • Avoid early cleansing and transformation
  • Use composite business keys when uniqueness is not guaranteed
  • Apply business rules in the Business Vault
  • Use Same-As Links for key normalization

Handled correctly, this pattern gives you maximum flexibility, auditability, and long-term stability—exactly what Data Vault was designed for.

Watch the Video

Meet the Speaker

Julian Brunner Senior Consultant

Julian Brunner
Senior Consultant

Julian Brunner is working as a Senior Consultant at Scalefree and studied Business Informatics and Business Administration. His main focus is on Business Intelligence, Data Warehousing and Data Vault 2.0. As a certified Data Vault 2.0 Practitioner he has over 5 years of experience in developing Data Platforms, especially with the Data Vault 2.0 methodology. He has successfully consulted customers from different sectors like banking and manufacturing.

The Data Vault Handbook

Build your path to a scalable and resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

Leave a Reply

Close Menu