Part 1 – Theoretical understanding of multiple timelines in a data warehouse.
What is “Multi-Temporality” in a Data Warehouse?
Before we start talking about multi-temporality, let’s define the term bi-temporality first, given, a common misconception is that Data Vault 2.0 is only bi-temporal (which is wrong):
“Bitemporal Modeling is a specific case of Temporal database information modeling technique designed to handle historical data along two different timelines. This makes it possible to rewind the information to “as it actually was” in combination with “as it was recorded” at some point in time.” (According to: https://en.wikipedia.org/wiki/Bitemporal_Modeling)
Bi-temporality just addresses two timelines, which are commonly referred to as “System Time” (the technical timeline) and “Valid Time” (the business timeline). Data Vault Satellites, Point-in-Time tables (PIT) and Bridge tables are able to address multiple active timelines in the same record. Let’s categorise just some of them:
- Source Driven times
- Created time
- Updated time
- Deleted time
- System times
- CDC time
- Message event time
- Business times
- any times which represent when something happened or will happen in the “real world” like a purchase or sell timestamp.
- Time spans
- can be technical, can be business driven
- Contract start and end dates/times
- Technical valid from and valid to dates/timestamps
- Enterprise Data Warehouse (EDW)
- Load date timestamp (set in the first layer of the EDW during the insert)
- Timestamp when a record is written in the table
All of these dates and timestamps could be found in just one record in a Satellite table. This allows us to look at the data from multiple time perspectives. Therefore, the Data Vault Model is multi-temporal and not just bi-temporal.
The Load Date Timestamp
One requirement to realise a multi-temporal perspective on the data is that the Load Date Timestamp is used for loading data into Satellites when doing the delta check. Only the Load Date Timestamp can provide us with a consistent, gapless and non-overlapping time which is under our control. This allows us to have an unrestricted view on the multi-timelines in Satellites.
All other timestamps are not qualified. First, they would restrict the number of possible perspectives on the data to a single instance. Additionally, they can have gaps, overlappings, be NULL and are not controlled by the Enterprise Data Warehouse teams.
In short: We will never get rid of the Load Date Timestamp which is set during the insert in the first layer of the Enterprise Data Warehouse architecture and pushed through all the layers as far as possible (think of aggregates in the Business Vault over multiple Load Date Timestamps).
3 Different Perspectives on Data
The core Data Vault is differentiated into the Raw Data Vault (RDV) and the Business Vault (BV). The reason being is to split soft business rules from the hard business rules as soft business rules can change the content of the data. The result is that the number of possible perspectives on the raw data is reduced when soft business rules are applied early in the loading architecture. The same rules have to be applied to timelines. Timeline driven business perspectives on raw data happen earliest in the Business Vault.
There are basically three different perspectives related to timelines in the data warehouse: A data warehouse perspective, a business perspective and an information delivery perspective.
The data warehouse perspective relates to the Load Date Timestamp to have a consistent incremental integration of the data into the Raw Data Vault and Business Vault.
The business perspective relates to all dates and timestamps which are delivered by the source system. Also, the technical fields are counted in the same way as the created, updated or deleted date/timestamp from the source system. Everything that is part of the payload is handled as descriptive data during the Raw Data Vault loading.
Now, different queries can create all possible views of the raw data; for example aggregates based on the most recent record per Business Key and grouped by a sales date.
The information delivery perspective relies on a snapshot to “freeze” all the data as it was active at a point in time. That said, the interpretation of what “active” means can be different.
To address this, multiple perspectives are able to be created. That’s also the reason why we talk about the single version of the facts in the Raw Data Vault and multiple versions of the truth in the Business Vault (different perspectives on raw data = different truths from different standpoints).
This could, for example, be an hourly, daily, weekly, monthly or yearly snapshot or timespan. The Data Vault entities which are used here are the PIT and Bridge tables. The current delta of master data like customer data in a Satellite can be “frozen” based on a daily snapshot in a PIT table. Also, transactional data kept in a Non-Historized Link can be attached to an hourly snapshot in a Bridge table.
How that exactly looks like will be shown in the next part of the multi-temporal newsletter series.
Get Updates and Support
Please send inquiries and feature requests to [email protected].
For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.