Data Vault Business Keys
Data Vault architects often encounter a common challenge when their source systems mix surrogate technical IDs with true business keys. In one real-world scenario, an Employee
table used a technical ID
as its primary key, while the legitimate business identifier was NBR
. Downstream tables—even a self-referencing manager hierarchy—used that ID
as a foreign key. When building Hubs on both ID
and NBR
and linking them with a Same-As Link, the result resembles a Source Vault design—and it quickly becomes unwieldy to repeat ID→NBR
lookups for every referencing table.
In this article, we’ll explore the best practice for handling mismatched keys: pre-joining business keys in your staging (or view) layer before loading the Raw Data Vault. You’ll learn why this denormalization is fully compliant with Data Vault principles, how to implement it (even with CDC feeds), and when you might need alternate approaches.
In this article:
- Why Separate Business Keys from Technical IDs?
- The Pitfall: Source Vault by Accident
- Best Practice: Pre-Join Business Keys in Staging
- Tool Support: dbt & FlowBI
- Handling CDC & Empty Deltas with Forward Lookup
- Caveats: Pre-Join Within a Single Source Only
- When You Can’t Pre-Join: Source Vault as Last Resort
- Step-By-Step Implementation Guide
- Benefits of Pre-Joining Business Keys
- Conclusion
- Watch the Video
- Meet the Speaker
Why Separate Business Keys from Technical IDs?
Surrogate keys (ID
) offer stable integer references, simple indexing, and isolation from business rule changes. But they aren’t meaningful outside the operational schema. True business keys (NBR
)—like employee numbers, order numbers, or product SKUs—carry real-world meaning and ensure consistency across downstream BI and analytics models.
- Business Key (
NBR
): Immutable identifier used in reporting, cross-system integration, and audit. - Technical ID (
ID
): Auto-generated surrogate for OLTP performance and referential integrity.
When you build a Hub on the business key, all Links and Satellites must reference that same key. Mixing in surrogate IDs without conversion violates business lineage and forces repetitive lookups.
The Pitfall: Source Vault by Accident
Faced with source tables referencing ID
, some teams create:
- A Hub on
ID
(surrogate), - A Hub on
NBR
(business key), - A Same-As Link between them to tie
ID↔NBR
.
This “Source Vault” pattern captures technical IDs as though they were business keys—contradicting the principle that your Vault’s integration key must be a shared business identifier. Moreover, every time you load any Link or Satellite that uses ID
, you must look up NBR
via the Same-As Link. Tedious, error-prone, and defeating the agility of your Data Vault.
Best Practice: Pre-Join Business Keys in Staging
Data Vault training explicitly allows you to reshape your staging area—denormalizing or normalizing source data to simplify Raw Vault loads. Pre-joining means: before your load process begins, enrich every source record with the true business key (NBR
) rather than the surrogate ID
. You then feed the Hub/Satellite/Link loaders with business keys directly—no Same-As Link gymnastics required.
How It Works
- Create a view or staging query that joins your
Employee
table (onID
) to itself or to the hierarchy table to retrieveNBR
. Add theNBR
field into every downstream staging record. - Use that pre-joined staging view as the source for your Data Vault loaders. All Hubs, Links, and Satellites can now reference
NBR
consistently. - Drop the accidental Hub on
ID
and Same-As Link—your Vault only contains the true business key Hub (Hub_Employee
onNBR
).
This approach turns the repeated lookup problem into a one-time denormalization, improving performance and maintainability.
Tool Support: dbt & FlowBI
Modern Data Vault toolkits recognize this pattern:
- datavault4dbt package: Offers a
pre_join
feature to automatically enrich staging tables with business keys. - FlowBI: Includes configuration options to map surrogate IDs to business keys before Vault loads.
When your staging area resides on a cloud data lake (e.g., AWS S3 + Redshift Spectrum, Azure Data Lake + Synapse), these tools can reference external tables and materialize pre-joined views seamlessly.
Handling CDC & Empty Deltas with Forward Lookup
Change Data Capture (CDC) introduces a nuance: sometimes the Employee
record doesn’t change (NBR
remains the same), but the hierarchy table (manager assignment) does. A pure inner-join staging view would omit the hierarchy change because no new employee row appeared.
Forward Lookup Pattern
- When your CDC batch contains only hierarchy changes, load your staging view with the
ID
column but no accompanyingNBR
. - Instead of joining to the source
Employee
table (which has no new row), perform a forward lookup against the target Satellite in your Raw Vault. That Satellite already stores every historic mapping ofID→NBR
. - Retrieve the latest
NBR
value for eachID
and inject it into your staging records—just as if you had joined to the source.
This assumes your CDC infrastructure and initial loads correctly populated the Satellite. If CDC reliability is questionable, you may need to capture ID
first and resolve to NBR
later in a Business Vault layer.
Caveats: Pre-Join Within a Single Source Only
While you can (and should) pre-join within one source system, avoid chaining pre-joins across multiple systems. If you first wait for System A’s staging load to produce NBR
from its ID
, and then join System B’s staging to A’s data, you introduce cross-system load dependencies. That forces you to serialize loads—waiting for one system’s batch to finish before you can process another.
To maintain parallel ingestion, each source should be pre-joined only to its own business keys. If two systems share a business key, let that intersection happen downstream in your Vault (via Hub on the shared business key), not in the staging layer.
When You Can’t Pre-Join: Source Vault as Last Resort
In rare environments where you cannot reliably pre-join—legacy databases with locked-down permissions or untrustworthy CDC—you may fall back to a Source Vault. In this design:
- Your Hubs use surrogate
ID
as the primary key (capturing the technical ID). - You defer mapping to true business keys into the Business Vault layer, after all sources land.
Source Vaults make sense only when staging denormalization is impossible. Otherwise, they sacrifice business clarity for expedience.
Step-By-Step Implementation Guide
- Identify Business Keys: Catalog each table’s true business key(s)—not the surrogate PKs.
- Build Staging Views: For each source, create a view that
LEFT JOIN
s the “owner” table back to itself (or to its lookup tables) to pull inNBR
whereverID
appears. - Validate Keys: Ensure every staging record includes a non-null business key. Flag or quarantine any orphans (IDs without known business key).
- Configure Load Scripts: Point your Vault loaders (Hubs, Links, Satellites) at these staging views. Remove any loaders that target surrogate key Hubs or Same-As Links.
- Implement Forward Lookup: For CDC batches that may omit source changes, add a fallback join to the Employee Satellite in your load script to fetch the last known
NBR
for eachID
.
Benefits of Pre-Joining Business Keys
- Simplicity: One denormalization step replaces dozens of repetitive lookups.
- Performance: Staging views optimize key enrichment in set-based SQL rather than row-by-row Link loads.
- Lineage: Your Raw Vault contains only true business keys, preserving clear end-to-end lineage.
- Maintainability: Future source schemas that reference
ID
get mapped automatically via the shared staging view logic.
Conclusion
Mismatched surrogate and business keys need not derail your Data Vault design. By embracing pre-joining business keys in your staging layer—along with forward lookups for CDC edge cases—you preserve a clean, business-centric Vault model without cumbersome Same-As Links. Reserve Source Vaults only for environments where staging denormalization simply cannot occur. With these best practices, your Vault remains performant, transparent, and aligned with true business identifiers.
Watch the Video
Meet the Speaker

Michael Olschimke
Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!