Skip to main content
search
0
All Posts By

Michael Olschimke

Michael Olschimke is the Co-Founder and CEO of Scalefree and a "Data Vault 2.0 Pioneer" with over 20 years of IT experience. A Fulbright scholar and co-author of Building a Scalable Data Warehouse with Data Vault 2.0, Michael is a global authority on AI, Big Data, and scalable Lakehouse design across sectors like banking, automotive, and state security.

Data Vault PITs in PowerBI – Joining Type 2 Dimensions

Watch the Video

In our ongoing series, our CEO Michael Olschimke discusses a question from the audience:

“We would like either build a semantic model or let end users build it themselves as star schemas.

In Infomarts we expose facts and dimensions.

Dimensions are based on pits and expose all contextual attributes valid to a given snapshot data.

Now the problem is that facts and dimenions need to be joined not only by main keys of dimension, but also a snapshot data. However, PBI allows only joins with 1 attribute.

What is a recommended way to tackle this?

I thought of introducing sequence numbers in PITs and exposing them in virtualized fact views, additionally exposing separate snapshot dimension that synchronizes snapshots of all the dims (otherwise we end up in cartesian join). However this defeats partitioning in the PITs (join over sequence number and not hashKey + SnapshotDate blocks partition pruning).”

Michael delves into an in-depth discussion on leveraging Data Vault’s Point-in-Time (PIT) tables within PowerBI, exploring how this integration enhances analytical capabilities and supports dynamic reporting in the realm of Big Data.

Multiple PIT Tables for Different Business Scenarios

Watch the Video

In our ongoing series, CEO Michael Olschimke addresses an audience question:

“We have a need for PIT tables to be used in different business scenarios and would like to use different SQL statements to load a number of PIT tables, one for each scenario.

What is your take on it?”

PIT tables help track the historical state of records over time for analysis, and customized SQL statements cater to the specific data needs of each business context.

Michael will cover best practices, performance optimization, and data integrity when managing numerous PIT tables, offering insights that can enhance organizational data strategy.

Defining Multiple Snapshots per Day via Control Table

Watch the Video

In our ongoing series, our CEO Michael Olschimke discusses a question from the audience:

“Would a micro or mini batch refresh frequency in PIT tables of the data warehouse with subsequent aligned reporting yield multiple timestamps for a certain date in the snapshot control table?”

Michael goes on to explore the concept of multiple snapshots and how they can provide valuable insights into the evolution of data over time. By capturing snapshots at different points in time, organizations can gain a deeper understanding of trends, patterns, and anomalies within their data. This nuanced approach to data management can lead to more informed decision-making and improved overall performance.

Join us as we unravel the complexities of multiple snapshots and their impact on the data warehouse landscape.

Filtering Snapshot Date Frequencies in PIT Tables

Watch the Video

In our ongoing series, our CEO Michael Olschimke discusses a question from the audience:

“Regarding the information delivery perspective where the business user likes to have stable reports, is it correct to say that each unique frequency of a required business snapshot date (where the snapshot date is a timestamp), will have its own filter column in the snapshot control table? E.g. weekly, monthly, but also more specific ones to suit a particular business process, like in education the beginning of the nth quartile week to ensure all grades achieved are registered?”

He discusses the importance of having unique filter columns in the snapshot control table for each distinct frequency of a required business snapshot date.

By customizing snapshot date frequencies to suit different business processes, organizations can enhance the accuracy and relevance of their reporting mechanisms.

Michael’s insights highlight the strategic importance of managing snapshot dates effectively to optimize information delivery and drive better decision-making.

Soft-Deleting Records in Data Vault 2.0

Watch the Video

In our ongoing series, our CEO Michael Olschimke delves into a question raised by a member of the audience:

“Hi, my question is about the effectiveness of satellite tables. I notice there are no updates in the data vault. I am struggling to comprehend how we can close the End_Date field in the satellite table without actually updating it.”

In response to this query, Michael examines the concept of soft-deleting records and its implications on data management and integrity. Through insightful discussions and practical examples, he sheds light on the importance of implementing strategies such as soft deletion in maintaining data consistency and accuracy within satellite tables.

Hash Key and Hash Diff Computation

Watch the Video

In our ongoing series, CEO Michael Olschimke  answers a question from the audience regarding Has Key and Hash Diff computation:

“What is the recommended way for Hashing? In the case of an optional relationship, should we first replace the NULL value with the default value before hashing? What is the reason for this? Should we include the BK of parent (hub/link) in calculation of Hash Diff? The above was mentioned in your book ‘Building a scalable Data Warehouse with Data Vault 2.0’ but other blogs emphasize a loading code for a satellite that compares the latest Hash Diff per Hash Key (even if BK is included in Hash Diff). Is there a specific reason?”

Specifically, the discussion delves into whether it is advisable to replace NULL values with default values before hashing in the context of an optional relationship. The rationale behind this practice and the potential inclusion of the BK of the parent (hub/link) in the calculation of Hash Diff are explored.

This topic was previously highlighted in Michael’s book ‘Building a scalable Data Warehouse with Data Vault 2.0’, but contrasting perspectives from other blogs advocate for a loading code approach for a satellite that considers the latest Hash Diff per Hash Key, even if the BK is included in the Hash Diff.

Michael sheds light on the significance of Hash Key and Hash Diff in this discussion.

PIT and Effectivity Satellites

Watch the Video

In our continuous series, our CEO Michael Olschimke delves into a thought-provoking inquiry posed by a member of our audience:

“Why do numerous illustrations of Point-In-Time (PIT) tables utilize load datetimes instead of applied datetimes from the source, despite the fact that analysts generally aim to restore data for a specific datetime as it existed in the source? Are there instances of PIT table implementations that incorporate both applied datetimes and load datetimes?”

Michael delves into the intricacies of Point-In-Time (PIT) tables, exploring the rationale behind the prevalent use of load datetimes versus applied datetimes in these structures. He addresses the challenges and considerations in designing PIT tables that accurately reflect the state of data as per the source system, while also considering the need for historical restoration based on specific datetimes. Michael provides insights into potential approaches or hybrid models that combine applied datetimes and load datetimes to meet both analytical requirements and data reconstruction needs effectively within PIT table implementations.

Data Vault Point In Time (PIT) Tables

Watch the Video

In our continuing series, our CEO, Michael Olschimke, delves into a thought-provoking question raised by an audience member regarding Point-In-Time (PIT) tables:

“I’m trying to grasp why the majority of Point-In-Time (PIT) table illustrations do not incorporate tracking satellites to accurately reconstruct the timeline for a business key. For example, I have three snapshots for business key 001 stored in satellite1. This seems inaccurate as the record was deleted from the source. Why does this prevail?”

Michael engages in a detailed discussion on the concepts of Point-In-Time (PIT) and Effectivity Satellites, shedding light on the importance of incorporating tracking satellites for maintaining data integrity and timeline accuracy in data warehousing practices.

Dependent Child Links with Status Tracking

Watch the Video

In our ongoing series, CEO Michael Olschimke addresses a viewer’s inquiry regarding dependent child links with status tracking:

“I have a source with multiple records for one BK. I model this as a hub and a satellite with a dependent child key. As a status tracking satellite only tracks the parent key hub; how can I track the deletes of the records with the dependent child?”

The viewer presents a challenge tied to a source containing multiple records for a single Business Key (BK). Their current modeling involves a hub and a satellite with a dependent child key. However, they are grappling with how to effectively track the deletions of records associated with the dependent child, especially when a status tracking satellite typically tracks only the parent key hub.

Michael engages with this query, shedding light on strategies and best practices related to Dependent Child Links in the context of Data Vault modeling. He offers insights into addressing the specific scenario described by the viewer, providing practical guidance on tracking deletions in situations where dependent child keys are involved.

This episode serves as a valuable resource for Data Vault practitioners grappling with similar challenges in their modeling endeavors. Michael’s expertise provides clarity on how to navigate the intricacies of tracking deletions effectively within the Data Vault framework.

Data Mining in the Data Vault Architecture

Watch the Video

In our ongoing series, CEO Michael Olschimke addresses a viewer’s question:

“We have a data mining model to be applied during information delivery. Where does it fit in the Data Vault 2.0 architecture?”

The viewer inquires about integrating a data mining model into the Data Vault 2.0 architecture specifically for information delivery. They seek guidance on where this data mining aspect fits within the broader Data Vault framework.

Michael delves into the topic of Data Mining in the context of Data Vault 2.0. He provides insights into the strategic placement of data mining models within the architecture, emphasizing their role in enhancing information delivery processes. Michael’s response sheds light on how organizations can effectively leverage data mining techniques to extract valuable insights while adhering to the principles of the Data Vault methodology.

This episode serves as a valuable resource for those navigating the intersection of data mining and Data Vault 2.0, offering practical guidance on seamlessly integrating data mining models into the architecture.

Data Vault Naming Conventions

Watch the Video

In our continuous series, CEO Michael Olschimke addresses a viewer’s question reading naming conventions in Data Vault:

“What naming conventions do you recommend for the Data Vault model?”

The viewer seeks advice on the recommended naming conventions for structuring the Data Vault model. Recognizing the significance of clear and standardized naming in data modeling, the question focuses on eliciting practical insights and guidelines.

Michael shares his expertise on effective naming conventions tailored for Data Vault models. He emphasizes the importance of consistency, clarity, and meaningful names to enhance the comprehensibility and maintainability of the Data Vault structure. By providing practical recommendations, Michael aids viewers in establishing robust naming conventions aligned with best practices in Data Vault modeling.

This episode serves as a valuable resource for data professionals aiming to optimize their Data Vault models through well-defined and organized naming conventions.

Modelling the Date Dimension in Data Vault

Watch the Video

In our continuous series, CEO Michael Olschimke delves into a question from the audience about how to model the date dimension in Data Vault:

“In many data sources we get data with a DATE data type. In some cases we want to use a Time-Dimension for this fields. How would you model this in Data Vault:

  •  As Time-Hub in Raw Vault and referencing that Hub in a Link?
  • As Time-Reference Table and then joining that in the IM? Should the Time Dimension hold a Hash Key as Dimension Key for that, or the Business Key (date)?
  • Or both options?”

The viewer raises a pertinent query regarding the modeling of DATE data types from various sources within the Data Vault modeling framework. The focus is on incorporating a Time-Dimension for these date fields, presenting multiple options for consideration.

Michael explores potential solutions, shedding light on two prominent strategies:

Time-Hub in Raw Vault: Creating a dedicated Time-Hub in the Raw Vault and referencing it in a Link. This approach involves establishing a distinct hub for time-related data, providing a structured foundation for subsequent processing.

Time-Reference Table in Information Mart (IM): Alternatively, considering a Time-Reference Table in the IM and joining it as needed. The discussion delves into the nuances of choosing between a Hash Key and Business Key (date) for the Time Dimension, offering insights into the implications of each choice.

Michael’s insights provide valuable guidance for navigating the complexities of modeling date dimensions within the Data Vault paradigm. By weighing the pros and cons of different approaches, viewers gain a deeper understanding of how to effectively integrate time-related data into their Data Vault architecture.

Close Menu