Skip to main content
search
0
Scalefree Knowledge Webinars Data Vault Friday Capturing Changing Inventory Levels in Data Vault

Inventory data presents a specific challenge in Data Vault modeling: it arrives as snapshots, it can be corrected retroactively, and deletions need to be traceable back to a specific snapshot date. A user recently proposed a solution using a multi-active Snapshot Satellite — a reasonable starting point — but the question of how to handle corrections and logical deletes at the snapshot level pointed toward a cleaner approach. This post walks through the recommended pattern: a Non-Historized Link with technical counter transactions.



Capturing Changing Inventory Levels: Understanding the Source Data

The scenario is this: a source system sends inventory data as full snapshots. Each extract may contain data for multiple snapshot dates, and if a correction exists for a previously loaded snapshot, the source resends the complete data for that snapshot date. The inventory level represents the count of a product at a given location and store at the end of a specified day.

This is what Kimball defines as a snapshot-based fact — a periodic full count of all products in a warehouse, day by day. The complication is that those snapshots can be corrected. Three days after an initial count, a miscalculation might be identified and the inventory level for that past snapshot date needs to be updated — without affecting the two more recent snapshots.

Why the Multi-Active Satellite Approach Has Limitations

The proposed solution used a Link referencing Product, Store, and Location Hubs, with a multi-active Satellite attached. The snapshot date was nominated as the multi-active attribute, added to the Satellite’s primary key alongside the hash key and Load Date Timestamp.

The instinct is sound — you need to track multiple snapshots and their corrections over time. But there’s a problem with using a business-supplied date as the multi-active attribute in the Raw Data Vault: it means trusting the source. In the Raw Data Vault, that’s a risk worth avoiding.

A more robust approach is to use the subsequence from staging as the multi-active attribute instead. The subsequence is assigned during staging and is guaranteed to be unique within each incoming batch. Because it’s unique within the batch, it’s also unique within any subgroup of that batch — including a multi-active group. If something goes wrong with it, it’s a problem you control and can fix. The snapshot date from the source then becomes what it actually is: a descriptive business timeline, added to the Satellite payload like any other descriptive attribute.

But even with this correction, the multi-active Satellite approach doesn’t cleanly solve the core problem: how do you capture a logical delete at the snapshot level — when an item disappears from a re-sent snapshot that was already loaded?

The Data Vault Handbook:
Core Concepts and Modern Applications

Build Your Path to a Scalable and Resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

The cleaner solution is to use a Non-Historized Link rather than a standard Link with a Satellite. Non-Historized Links are designed precisely for capturing facts coming from a source — transactions, events, and inventory levels. They sit close to the incoming data structure and are loaded incrementally using an Alternate Key.

For this scenario, the Non-Historized Link contains the following fields:

  • Product Key, Store Key, Location Key — the Hub references (hash keys)
  • Load Date Timestamp (LDTS) — when the record was received
  • Snapshot Date (SD) — the business timeline from the source, indicating when the inventory was counted
  • Snapshot Date Timestamp (SDTS) — the Data Vault snapshot date used in information delivery
  • A counter column (X) — with a value of 1 for records coming from the source, and -1 for technical counter transactions

The Alternate Key — used to identify whether a record already exists in the target — is the combination of Product Key, Store Key, Location Key, Snapshot Date, and Load Date Timestamp.

Two INSERT Statements: Inserts and Counter Transactions

Loading this Non-Historized Link uses two parallel INSERT statements, not an UPDATE pattern.

INSERT statement one loads all records from the Staging Area where the Alternate Key does not yet exist in the target. This covers both genuinely new records and new versions of corrected records — because the corrected version, with its new Load Date Timestamp, doesn’t exist in the target yet and therefore qualifies as new.

INSERT statement two handles deletions and replaced versions. It identifies records that exist in the target but are absent from the current full-load snapshot in the Staging Area. These records are either hard-deleted in the source or represent the old version of an updated record. For each of these, a technical counter transaction is inserted — an identical row with a counter value of -1 instead of 1.

This means an update in the source produces two rows in the target: a -1 counter transaction that cancels the old version, and a 1 insert for the new version. The data itself is never modified — only the structure changes. This aligns with the hard rule in Data Vault: the Raw Data Vault transforms structure, not content.

How Aggregation Reveals the True Inventory Level

The power of this pattern becomes clear at query time. By summing the counter column across all rows matching a given Alternate Key, you can determine the current state of any inventory record at any point in time.

Here’s how it plays out across several days for a single product-store-location combination:

  • Day 2: Initial snapshot for Day 1 is loaded. Counter value: 1. Sum = 1. One active inventory record.
  • Day 4: A correction arrives for Day 1. The old version gets a -1 counter transaction; the corrected version gets a 1 insert. Sum = 1 - 1 + 1 = 1. Still one active record, now at the corrected level.
  • Day 5: The record is deleted from the source for Day 1. A -1 counter transaction is inserted. Sum = 1 - 1 + 1 - 1 = 0. Zero active records. The inventory level for that snapshot date is gone.

By filtering on the Snapshot Date and aggregating the counter column, you always know exactly how many active facts exist for a given snapshot — and what the current inventory level is. By filtering on the Load Date Timestamp, you can also travel back in time and see what the inventory looked like from the perspective of any earlier date. This gives you both a corrected view of inventory history and a full audit trail of when corrections were made.

The Data Vault Handbook:
Core Concepts and Modern Applications

Build Your Path to a Scalable and Resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

Performance and Reporting

On a column-based storage engine, this pattern performs extremely well. Aggregating a counter column across large volumes of inventory records is fast, and the resulting structure integrates cleanly with dashboards. Business users see dimension references for Product, Store, and Location; a snapshot date for time-based aggregation; the inventory level measure; and the counter column for filtering active records. The behavior is exactly what they’d expect — current inventory levels that reflect corrections, with history available on demand.

Where This Pattern Also Applies

Technical counter transactions in Non-Historized Links are not specific to inventory data. The same pattern applies anywhere you have full-load snapshot data with corrections — any scenario where an update in the source needs to be represented as a deletion of the old version and an insertion of the new one, without modifying existing rows.

The pattern can also be applied in the Business Vault, particularly in Bridge Tables, when the counter transactions involve calculated values that need to be corrected rather than raw source data. The principle is the same; the layer where it’s applied depends on whether the transformation is structural (Raw Data Vault) or involves business logic (Business Vault).

For more recordings on Non-Historized Links and technical counter transactions, the Scalefree YouTube channel has several dedicated sessions on the topic. And to master these patterns hands-on, explore our Data Vault 2.1 Training & Certification. The free Data Vault handbook — available as a physical copy or ebook — is also a great starting point for the core concepts.

Watch the Video

Leave a Reply

Close Menu