Link Temporality: Handling Source Data Errors with Effectivity Satellites
In modern data warehousing, ensuring accurate historical records is paramount. The Data Vault methodology excels at capturing raw, unfiltered data changes over time. But what happens when your source system makes errors—linking an entity to the wrong counterpart—and then corrects them? Without the right approach, your Link tables can become confusing, making it hard to identify the true “current” relationship. This article explores an elegant solution: using an Effectivity Satellite to manage link temporality and error correction in your Data Vault.
In this article:
- The Problem: One-to-One Relationship with Source Data Fluctuations
- Why Not Tweak the Link Table Directly?
- Introducing the Effectivity Satellite
- Step-by-Step: Tracking Link Changes
- Loading Patterns: Full Loads vs. CDC vs. Incrementals
- Handling Unreliable Source Deliveries with “Last Seen”
- Querying Current Relationships
- Benefits of the Satellite Approach
- Conclusion
- Watch the Video
- Meet the Speaker
The Problem: One-to-One Relationship with Source Data Fluctuations
Imagine two hubs in your Data Vault: Hub A and Hub B. A business rule dictates that each A-entity can be linked to exactly one B-entity at a time. Your Link table models these connections. The typical workflow is:
- Day 1: Source links A1 → B1 → you load this into your Link.
- Day 2: Source mistakenly links A1 → B2 → you load the new link.
- Day 3: Source corrects back to A1 → B1 → how do you capture this as the current, up-to-date link?
Since the Link table only records distinct relationships and ignores duplicates, reloading A1 → B1 on Day 3 won’t insert a new row or update any timestamp. You lose clarity on which relationship is active today.
Why Not Tweak the Link Table Directly?
You might be tempted to add LOAD_END_DATE
or an “active” flag directly to the Link table to mark when a relationship becomes obsolete. However, this violates Data Vault best practices. The Link should remain a pure, append-only record of every relationship ever observed, without status flags or end dates. Instead, you delegate temporality to a dedicated satellite.
Introducing the Effectivity Satellite
An Effectivity Satellite sits alongside your Link and records the lifespan of each relationship. Its core columns include:
- Link Hash Key: foreign key back to your Link record
- LOAD_DATE_TIMESTAMP: when you first detected or ended this link
- DELETE_DATE_TIMESTAMP: when the link was deactivated (or a far-future “end of time” for active rows)
This design cleanly separates the static relationship definition (Link) from its dynamic, time-dependent status (Effectivity Satellite).
Step-by-Step: Tracking Link Changes
Day 1: Initial Relationship
Link: A1–B1
Effectivity Satellite:
LOAD_DATE = D1
DELETE_DATE = 8888-12-31
We load A1→B1 and mark it active by setting its DELETE_DATE
to the end of all time.
Day 2: Erroneous Change
Link: add A1–B2
Effectivity Satellite updates:
– For A1–B1: DELETE_DATE = D2 (deactivated)
– For A1–B2: LOAD_DATE = D2, DELETE_DATE = 8888-12-31
The old relationship is soft-deleted, and the new one is inserted and marked active.
Day 3: Correction Back to Original
Effectivity Satellite updates:
– For A1–B2: DELETE_DATE = D3
– For A1–B1: new row (reactivation): LOAD_DATE = D3, DELETE_DATE = 8888-12-31
Instead of touching the Link, we simply record two new deltas: ending B2 and re-activating B1. Querying the satellite for the active row (where DELETE_DATE
= 8888-12-31) reveals the current link.
Loading Patterns: Full Loads vs. CDC vs. Incrementals
Your data delivery method influences how you detect deletions:
- Full Loads: Compare all active links in the satellite against staging; any missing link implies a deletion. Insert a delta to end-date it.
- Change Data Capture (CDC): Leverage the source’s delete events and timestamps as your
DELETE_DATE_TIMESTAMP
. - Incremental without Deletes: Combine staging deltas (inserts/updates) with a lightweight full load of just business keys. Missing keys signal deletions.
In all cases, the satellite becomes the single source of truth for link effectivity.
Handling Unreliable Source Deliveries with “Last Seen”
Sometimes, your source export may inadvertently drop rows (e.g., locked mainframe records). To avoid false deletions, maintain a Last Seen Date on your effectivity satellite. If a link hasn’t been seen for a configurable “grace period” (e.g., three weeks), a business rule in your Business Vault marks it deleted. This approach balances accuracy against source system quirks.
Querying Current Relationships
To retrieve only active links at any point:
SELECT L.*, S.LOAD_DATE_TIMESTAMP
FROM Link L
JOIN Effectivity_Sat S
ON L.Link_HashKey = S.Link_HashKey
WHERE S.DELETE_DATE_TIMESTAMP = '8888-12-31';
This simple filter returns the true, live relationships, abstracting away all historical noise and source-system corrections.
Benefits of the Satellite Approach
- Auditability: Full history of when links were activated and deactivated.
- Purity: Link tables remain simple, append-only, and free of flags/end dates.
- Flexibility: Supports full loads, CDC, and incremental patterns seamlessly.
- Business Rules: “Last seen” logic can live in Business Vault without polluting raw Data Vault layers.
Conclusion
Handling source data errors—especially when relationships ping-pong between states—requires a robust temporal strategy. By delegating link effectivity to a dedicated satellite, you maintain a clean Link table, capture every change, and easily identify the current relationship. Whether you’re dealing with full loads, CDC, or flaky source exports, this pattern scales, remains auditable, and adheres to Data Vault best practices. Implement effectivity satellites in your Data Vault to master link temporality and build a more resilient, transparent data platform.
Watch the Video
Meet the Speaker

Michael Olschimke
Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!