Same-as-Links: Enterprise-Wide Deduplication Across Multiple Sources
One of the more powerful but nuanced constructs in Data Vault is the Same-as-Link (SAL). Two questions came in recently that get at the heart of how SALs work across source systems: can a Same-as-Link have multiple sources, and can it span keys from different source systems? The answers differ depending on whether you’re working in the Raw Data Vault or the Business Vault — and understanding why reveals something fundamental about how Data Vault handles enterprise-wide deduplication and integration.
In this article:
Same-as-Links and Multiple Sources in the Raw Data Vault
The first question — can a Same-as-Link have multiple sources — is straightforward. Like any Link in the Raw Data Vault, a SAL can receive records from multiple source systems. Hubs consolidate business keys from different sources into the same entity, and Links do the same for relationships. As long as the relationship has the same semantic meaning and the same granularity across those sources, loading them into the same Link is valid and correct. So yes, a SAL in the Raw Data Vault can have multiple source systems contributing records to it.
The second question is more nuanced: can a SAL span keys from multiple sources — meaning one Hub reference on one side of the relationship comes from System A, and the other comes from System B?
In the Raw Data Vault, the answer is generally no — with one important exception. A core principle of Raw Data Vault loading is that each row comes from exactly one source system. Loading a single row that requires joining data from two independent source systems introduces a loading dependency: you have to wait for System A before you can load data from System B. That’s precisely the kind of tight coupling the Raw Data Vault is designed to avoid. Independent source systems should load independently.
The exception is when a single source system already knows both business keys. An ERP system, for example, might reference customers by a customer number that originates in a CRM system. The ERP system carries that key as a known reference — it’s available in a single source record without requiring a cross-system join at load time. In that case, a SAL row sourced from the ERP system can legitimately reference a business key that conceptually originates elsewhere. The single-source-per-row rule still holds; the integration happened upstream, inside the source system itself.
Same-as-Links in the Business Vault: Cross-Source Deduplication
In the Business Vault, the picture is quite different — and this is where SALs really show their value. When two independent source systems use completely different, unrelated business keys for what is actually the same real-world entity, there’s no source-level relationship to load. The Raw Data Vault captures both sets of keys in the same Hub (since they represent the same business concept), but there’s nothing in the source data to connect them.
This is where calculated Same-as-Links come in. Using descriptive data from both systems — names, addresses, contact details — fuzzy matching logic can identify that business key A from System A and business key B from System B refer to the same entity. That determination is a business rule. It belongs in the Business Vault. The result is a SAL entry that spans two business keys from completely independent source systems, calculated from the data rather than loaded from any single source.
This is one of the primary use cases for Same-as-Links: not just deduplicating records within a single source system, but integrating and deduplicating entities across the enterprise. Two CRM systems, two customer databases, two product catalogs — wherever the same real-world object appears under different identifiers in different systems, a Business Vault SAL can establish the connection and enable unified reporting and analysis across all of them.
For organizations dealing with complex multi-source environments, this kind of cross-system entity resolution is one of the most tangible business value deliverables a Data Vault implementation can produce. If you’re building or evaluating a enterprise data warehouse, the SAL pattern is worth understanding deeply — it’s the mechanism that turns a collection of source-aligned Hubs into a genuinely integrated enterprise model.
The Data Vault Handbook:
Core Concepts and Modern Applications
Build Your Path to a Scalable and Resilient Data Platform
The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.
Why the Raw and Business Vault Distinction Matters Here
The contrast between how SALs work in the Raw Data Vault versus the Business Vault illustrates a broader principle that runs through all of Data Vault 2.0 design: the Raw Data Vault captures what the sources deliver, as they deliver it, without interpretation. The Business Vault is where judgment, calculation, and business logic are applied.
Fuzzy matching is business logic. Deciding that two records represent the same entity is a business decision. Those decisions belong in the Business Vault — not because the Raw Data Vault can’t technically store the result, but because embedding that logic at the raw layer makes it invisible, untestable, and hard to change when the matching rules evolve.
By keeping the SAL calculation in the Business Vault, you get a clear audit trail of how the deduplication was performed, the ability to update matching logic without reloading source data, and a separation between “what the source said” and “what we believe to be true across sources.” That separation is one of the most operationally valuable properties of a well-structured Data Vault.
Practical Implications for Modeling
When modeling SALs in practice, a few things are worth keeping in mind. In the Raw Data Vault, SALs are appropriate when a single source system provides an explicit deduplication or matching relationship — a master data management export, a merge table, a golden record mapping from a source MDM system. The loading process remains clean and dependency-free.
In the Business Vault, SALs are the right tool when the matching logic needs to be calculated — whether through exact key matching across systems, probabilistic matching, fuzzy string comparison, or any other form of entity resolution. The SAL lives in the Business Vault, references the appropriate Hub twice (master and duplicate), and is populated by whatever calculation or mapping process produces the match.
In both cases, the hash keys in the SAL reference the same Hub, since by definition the master and the duplicate represent the same type of business object. This is what makes the SAL structurally elegant: it reuses existing Hub infrastructure to express an enterprise-wide identity resolution without requiring new structural entities.
To go deeper on Same-as-Links, Business Vault patterns, and enterprise integration strategies in Data Vault, explore our Data Vault 2.1 Training & Certification. And for a concise introduction to the full methodology, the Data Vault Handbook is available as a free physical copy or ebook.