Late Arriving Data
Late arriving or backdated data is a common challenge in data warehousing. In Data Vault, it is important to distinguish between the technical timeline used for loading data and the business timeline representing when events actually occurred in the real world.
In this article:
1. Technical Timeline vs Business Timeline
When loading data into the Raw Vault, always use a Load Date Timestamp (LDTS):
- Set when the record first arrives in your target system (landing zone, data lake, or Raw Vault).
- Never backdate this timestamp—it should always move forward.
- Used for incremental loading, delta detection, and reproducibility of snapshots.
This timestamp does not reflect the real-world timing of the data. It is purely a technical artifact to track ingestion order.
2. Capturing the Business Timeline
To handle late arriving or backdated data, use descriptive business dates stored in your satellites, such as:
- Apply Date / Effective Date: When the data became valid in the source system or real world.
- Last Modified Date: When the record was last changed in the source system.
These business timestamps allow you to create snapshots or temporal views that reflect the true order of events.
3. Timeline Corrections Without an Extended Tracking Satellite
You can correct timelines without adding additional satellites by leveraging the business timestamps stored in your existing satellites:
- Create temporal PIT tables or snapshots based on the business timeline, not the load date.
- When late-arriving data is detected:
- Option 1: Rebuild the affected snapshots to include the late data.
- Option 2: Apply counter transactions to reverse previous measures and apply the updated values.
- Always keep the load date unchanged—it only tracks ingestion, not validity.
This approach ensures that your historical reports reflect the correct business sequence without complicating the Raw Vault model.
4. Practical Guidelines
- Do not order or aggregate data using the load date when interpreting or reporting; always use business dates.
- Maintain separate timelines:
- Load Date: Technical, for data ingestion and reproducibility.
- Business Date: For interpretation, analysis, and handling late arrivals.
- Rebuild snapshots or use counter transactions as necessary when late data affects measures or aggregates.
Summary
Late arriving data can be handled in Data Vault without adding extra tracking satellites by clearly separating technical and business timelines. Load Date timestamps remain forward-only, while satellites store business dates to drive temporal snapshots and corrections. Using temporal PIT tables, counter transactions, or snapshot rebuilding ensures your analytics reflect the real-world timeline accurately.
Watch the Video
Meet the Speaker

Marc Winkelmann
Managing Consultant
Marc is working in Business Intelligence and Enterprise Data Warehousing (EDW) with a focus on Data Vault 2.0 implementation and coaching. Since 2016 he is active in consulting and implementation of Data Vault 2.0 solutions with industry leaders in manufacturing, energy supply and facility management sector. In 2020 he became a Data Vault 2.0 Instructor for Scalefree.
