Data Vault Links
Data Vault modeling separates data into hubs, links, and satellites to support scalable, auditable, and historized data warehouses. Links represent relationships between business entities (hubs) — for example, which supplier delivers which part. Coalesce.io makes creating Data Vault objects fast by providing a GUI-driven workflow for building stages, generating hash keys, and creating links that can be run incrementally against a target like Snowflake.
This article walks through the exact process demonstrated in the video transcript: exploring source tables, creating a Data Vault stage, generating hash columns, building a single-source link, then extending it to a multi-source link. The goal is to give you a clear checklist and practical tips so you can reproduce the steps in your Coalesce.io environment.
In this article:
- Why create a Data Vault link?
- Overview of the source data used
- Step 1 — Create a Data Vault stage in Coalesce
- Step 2 — Generate hub hash keys and the link hash key
- Step 3 — Select only the columns you need for the link
- Step 4 — Configure the link object
- Step 5 — Inspect results and generated code
- Creating a multi-source link
- Common pitfalls & best practices
- When to add satellites
- Conclusion
- Watch the Video
- Meet the Speaker
Why create a Data Vault link?
In Data Vault, links are the canonical way to model relationships between business entities. A link contains:
- a link hash key — generated from the business keys of all participants
- the hub hash keys of every participating hub
- load metadata such as load timestamp and record source
Creating links gives you an integrated relational layer independent of the transactional sources. Links allow you to track relationship history and stitch together hubs for downstream analytics and satellites.
Overview of the source data used
In the example from the video, you start with two resource tables in Coalesce:
- parts — defines parts with attributes like brand, name, type, size, and contains a reference to a supplier.
- suppliers — contains supplier metadata and a supplier key.
The parts table contains a column that references the supplier key, so semantically there is a relationship: part → supplier. This is the relationship we model as a Data Vault link.
Step 1 — Create a Data Vault stage in Coalesce
Every Data Vault object creation in Coalesce.io should start from a Data Vault stage. The stage is where you prepare the source rowset and add the derived columns that your downstream hub/link/satellite will need (hash keys, record source, load date, etc.).
- Right-click the source entity (in the video:
parts
) and select Node → Data Vault for Coalesce → Stage. - Coalesce.io creates a new object (e.g.,
DV_stage_parts
) and forwards the source columns into the stage. - Add any extra columns you need for the link: a hash key for the part hub, a hash key for the supplier hub, and the combined link hash key.
Best practice: keep the naming consistent. If the column will become the hub hash key for part
, name it something like HK_part_H
. That makes it obvious where the column flows later.
Step 2 — Generate hub hash keys and the link hash key
Hash keys are central to Data Vault 2.0 implementations. They provide stable, compact identifiers for business keys and are typically generated using a deterministic hash function (often MD5 or SHA). In Coalesce.io you can generate these with the GUI.
- Right-click the business key column for the part and choose Generate hash column — rename it to
HK_part_H
. - Repeat for the supplier business key and call it
HK_supplier_H
. - Select both business key columns (or both generated hash columns) and choose Generate hash column again to produce the link hash. Name it something descriptive, like
HK_part_supplier_L
(L for link).
Important: the link hash must be calculated from the business keys (or their hub hashes) of all relationship participants and in a deterministic order. That ensures the same relationship always produces the same link hash across sources and loads.
Step 3 — Select only the columns you need for the link
Coalesce.io lets you choose which stage columns flow into the next object. For the link you typically need:
- the link hash key (first column)
- the hub hash keys for all participants (in order)
- load metadata:
LoadDate
orLoadId
, andRecordSource
Select these five (or more if you want custom metadata) and then create the Data Vault link node by right-clicking → Node → Data Vault for Coalesce → Link.
Step 4 — Configure the link object
When the new link object appears, follow these checks:
- Confirm the link hash column is the first column and matches your naming standard (e.g.,
HK_part_supplier_L
). - Verify the hub hash keys follow —
HK_part_H
thenHK_supplier_H
. - Make sure load metadata (load timestamp/id and record source) are present.
- Open the Data Vault options panel in Coalesce.io and explicitly set which column is the link hash. This tells the Coalesce.io macro how to populate the link.
Click Create and then Run. Coalesce.io will generate and execute an INSERT statement (an incremental load) against your target (Snowflake in the example).
Tip: check the generated SQL before running. Coalesce.io usually issues an incremental INSERT that only adds new link rows, so the second run often returns “0 rows inserted” when there are no new relationships — that’s expected and desirable.
Step 5 — Inspect results and generated code
After running, Coalesce.io shows the result set and the generated SQL. In the video, the first execution inserted the bulk of rows (e.g., hundreds of thousands), and subsequent runs inserted zero because no new relationships existed. This shows the incremental behavior is functioning correctly.
-- example pseudo-SQL generated by Coalesce
INSERT INTO dv_link_part_supplier (...)
SELECT ...
FROM staging.dv_stage_parts
WHERE NOT EXISTS (
SELECT 1 FROM dv_link_part_supplier l
WHERE l.hk_link = staging.hk_part_supplier_l
);
Always validate the counts and confirm that the link hash values are unique per relationship. If you see duplicates or mismatches, re-check the hash generation order and the columns used.
Creating a multi-source link
A common Data Vault case is that the same relationship may appear in multiple source systems (e.g., ERP, procurement feed, third-party data). Data Vault links are designed to aggregate relationship evidence across sources. Coalesce.io supports multi-source links by allowing you to add additional source mappings to the link object.
The workflow from the video:
- Create or update a stage for the second (or additional) source that produces the same five columns (link hash, hub hashes, load metadata).
- On the link object in Coalesce, open the multi-source mappings and add a new source mapping (e.g.,
part_sub_source
). - Map the source stage columns to the link columns — Coalesce.io often auto-maps if names match.
- Run the link. Coalesce.io will process both sources and insert any newly observed relationships.
Note: If you run into an error where Coalesce.io cannot find the link hash for the new source, make sure the stage includes the generated join or column you intended to hash. In the video, the speaker realized they needed to explicitly build the join in the stage (generate the joined dataset) before the link could reference its hash column.
Common pitfalls & best practices
- Hash ordering: Always use a consistent ordering for participants when generating the link hash (e.g., alphabetical by object name or a documented canonical order).
- Naming conventions: Adopt a clear naming convention:
HK_<object>_H
for hub hashes andHK_<a>_<b>_L
for link hashes. Consistency helps Coalesce.io auto-map and keeps downstream ETL predictable. - Stage first: Always prepare your stage before creating a link. The stage is where joins, derived fields, and hash generation happen.
- Record source & load metadata: Always include record source and a load timestamp or load ID in the link so you can trace where and when a relationship was observed.
- Incremental testing: Run the insert once to ingest the historical baseline, then run it again to verify zero rows are inserted when no changes exist.
- Multi-source growth: Plan for adding multiple sources over time — links should be able to accept additional source mappings without rework.
When to add satellites
Links can also have satellites if you want to capture attributes specific to the relationship (for example, contract terms, effective dates, or relationship status). If you need to historize relationship attributes, create a satellite for the link and drive it with a hash diff or change detection strategy.
The video skipped satellite configuration because its focus was link creation, but be mindful that links + satellites are the full pattern for historized relationship data in Data Vault.
Conclusion
Creating Data Vault links with Coalesce.io is a straightforward, GUI-assisted process once you follow a repeatable pattern:
- Create a Data Vault stage for your source rows.
- Generate hub hash keys for all participating entities.
- Generate a deterministic link hash from the participants’ business keys (or hub hashes).
- Select the required columns and create the link object.
- Configure multi-source mappings as needed and run incremental loads.
Following these steps ensures your links are consistent, auditable, and ready for enterprise-grade analytics. If you haven’t implemented hubs yet, consider building hubs first (or in parallel) so your link hub keys map cleanly into the rest of the Data Vault model.
Happy modeling — and if you’re using Snowflake as your target, double-check the generated SQL from Coalesce.io and watch the run results to validate incremental behavior.
Watch the Video
Meet the Speaker

Tim Kirschke
Senior Consultant
Tim has a Bachelor’s degree in Applied Mathematics and has been working as a BI consultant for Scalefree since the beginning of 2021. He’s an expert in the design and implementation of BI solutions, with focus on the Data Vault 2.0 methodology. His main areas of expertise are dbt, Coalesce, and BigQuery.