Skip to main content
search
0
Scalefree Knowledge Webinars Data Vault Friday How to Deal With Late Arriving Data

Late Arriving Data

Late arriving or backdated data is a common challenge in data warehousing. In Data Vault, it is important to distinguish between the technical timeline used for loading data and the business timeline representing when events actually occurred in the real world.



1. Technical Timeline vs Business Timeline

When loading data into the Raw Vault, always use a Load Date Timestamp (LDTS):

  • Set when the record first arrives in your target system (landing zone, data lake, or Raw Vault).
  • Never backdate this timestamp—it should always move forward.
  • Used for incremental loading, delta detection, and reproducibility of snapshots.

This timestamp does not reflect the real-world timing of the data. It is purely a technical artifact to track ingestion order.

2. Capturing the Business Timeline

To handle late arriving or backdated data, use descriptive business dates stored in your satellites, such as:

  • Apply Date / Effective Date: When the data became valid in the source system or real world.
  • Last Modified Date: When the record was last changed in the source system.

These business timestamps allow you to create snapshots or temporal views that reflect the true order of events.

3. Timeline Corrections Without an Extended Tracking Satellite

You can correct timelines without adding additional satellites by leveraging the business timestamps stored in your existing satellites:

  1. Create temporal PIT tables or snapshots based on the business timeline, not the load date.
  2. When late-arriving data is detected:
    • Option 1: Rebuild the affected snapshots to include the late data.
    • Option 2: Apply counter transactions to reverse previous measures and apply the updated values.
  3. Always keep the load date unchanged—it only tracks ingestion, not validity.

This approach ensures that your historical reports reflect the correct business sequence without complicating the Raw Vault model.

4. Practical Guidelines

  • Do not order or aggregate data using the load date when interpreting or reporting; always use business dates.
  • Maintain separate timelines:
    • Load Date: Technical, for data ingestion and reproducibility.
    • Business Date: For interpretation, analysis, and handling late arrivals.
  • Rebuild snapshots or use counter transactions as necessary when late data affects measures or aggregates.

Summary

Late arriving data can be handled in Data Vault without adding extra tracking satellites by clearly separating technical and business timelines. Load Date timestamps remain forward-only, while satellites store business dates to drive temporal snapshots and corrections. Using temporal PIT tables, counter transactions, or snapshot rebuilding ensures your analytics reflect the real-world timeline accurately.

Watch the Video

Meet the Speaker

Profile Photo of Marc Winkelmann

Marc Winkelmann
Managing Consultant

Marc is working in Business Intelligence and Enterprise Data Warehousing (EDW) with a focus on Data Vault 2.0 implementation and coaching. Since 2016 he is active in consulting and implementation of Data Vault 2.0 solutions with industry leaders in manufacturing, energy supply and facility management sector. In 2020 he became a Data Vault 2.0 Instructor for Scalefree.

The Data Vault Handbook

Build your path to a scalable and resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

Leave a Reply

Close Menu