Skip to main content
search
0
All Posts By

Michael Olschimke

Michael Olschimke is the Co-Founder and CEO of Scalefree and a "Data Vault 2.0 Pioneer" with over 20 years of IT experience. A Fulbright scholar and co-author of Building a Scalable Data Warehouse with Data Vault 2.0, Michael is a global authority on AI, Big Data, and scalable Lakehouse design across sectors like banking, automotive, and state security.

Timezone to Be Used for Timestamps in Data Vault

Watch the Video

As part of our engaging Data Vault Friday series, our distinguished CEO, Michael Olschimke, delves into a pertinent question posed by an inquisitive member of our audience.

“In DV2.0 it is advised to use the UTC zone. How to store income timestamps from incoming sources that are in other time zones (e.g. GMT)? E.g. in Azure SQL server.”

In addressing this query, Michael provides valuable insights into the best practices for handling timestamps, especially when dealing with diverse time zones within the Data Vault 2.0 methodology. Emphasizing the recommendation to utilize the UTC zone, he navigates through the considerations and strategies for storing incoming timestamps that originate from sources operating in different time zones, such as GMT.

This illuminating discussion serves as a testament to our commitment to fostering knowledge and expertise in the realm of data architecture, making our Data Vault Friday series a valuable resource for data professionals.

Referencing Reference Tables in Data Vault

Watch the Video

In the ongoing journey of our Data Vault Friday series, our esteemed CEO, Michael Olschimke, delves into a thought-provoking question raised by a keen member of our audience.

“Is it possible to have an m:n link between two reference tables (country to currency)?”

In addressing this query, Michael navigates through the intricacies of data modeling, shedding light on the feasibility and implications of establishing an m:n (many-to-many) link between two reference tables, specifically in the context of countries and currencies.

By exploring the nuances of this scenario, Michael provides valuable insights into the challenges and considerations associated with creating such relationships in the Data Vault framework. This engagement exemplifies the essence of our Data Vault Friday series, where practical queries are met with informative discussions to enhance the understanding of data professionals.

Hierarchical Link by Using an Example in Data Vault

Watch the Video

With this week’s episode of Data Vault Friday, our CEO, Michael Olschimke, turns his attention to an insightful question about the use of a Hierarchical Link:

“Can you please explain in detail the hierarchical link using an example (different from the Bill of Material one, please)?”

In response to this discerning inquiry, Michael embarks on a comprehensive exploration of the concept of hierarchical links within the Data Vault framework. Drawing upon his extensive expertise, he elucidates the intricacies of modeling hierarchical links by presenting a distinctive example, distinct from the conventional Bill of Material scenario.

Through this elucidation, Michael aims to demystify the complexities surrounding hierarchical links, providing the audience with a practical and nuanced understanding of their application in diverse contexts. His commitment to delivering insightful explanations reflects the ethos of our Data Vault Friday series, which strives to empower data professionals with valuable knowledge.

Capturing Temporal Data on Changing Relationships in Data Vault

Watch the Video

In the latest installment of our enlightening Data Vault Friday series, our CEO, Michael Olschimke, delves into a thought-provoking query posed by a member of our engaged audience.

“We have received the relationship between investor and company with a PostingMonth for the last couple of months. Also, the ownership percentage for the relationship could change over time (see attached Excel for mock data :)). So our question is: should we take the Period as a part of the Investor_Company_Link? If yes, how can we track the relationship changes with Effectivity Satellite? Or do you think Multi-active link satellite is a better choice here?”

Michael meticulously explores the intricacies of modeling investor-company relationships, particularly when faced with dynamic factors such as changing ownership percentages over distinct time periods. He offers valuable insights into the considerations between incorporating Period as part of the Investor_Company_Link and the nuanced application of Effectivity Satellite or Multi-active link satellite to accurately capture and manage the evolving nature of these relationships.

This insightful discussion proves instrumental for data professionals navigating the complexities of representing dynamic relationships within the Data Vault framework.

Extending Existing Data Vault Model by GDPR-Identified Data

Watch the Video

In our ongoing Data Vault Friday series, our esteemed CEO, Michael Olschimke, tackles a compelling question raised by an engaged member of our audience.

“Let’s assume that DWH is fed from many source systems and one of them (some minor one, called ‘XYZ’) exports customer data identified by PERSONAL_ID (no other identifier available). We already have HUB_CUSTOMER based on some other customer identifier, and the PERSONAL_ID attribute is stored in SAT_CUSTOMER_PD. But there is one important thing regarding customer data, there are cases where multiple rows in HUB_CUSTOMER have the same PERSONAL_ID in mentioned satellite (which means, that some of the customers have been registered multiple times in our core systems).”

In this illuminating episode, Michael delves into the intricate scenario of integrating customer data from diverse sources, emphasizing the challenges posed by the absence of a unique identifier and the existence of duplicate entries. He articulates a strategic approach to address this nuanced issue within the Data Vault framework, providing practical insights and recommendations for achieving a coherent and accurate representation of customer information.

This discussion proves invaluable for data professionals navigating the complexities of consolidating diverse customer data sets with varying identifier structures.

Metadata Translation in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke discusses a question from the audience.

“Our EDW should use English entity names for hubs, links, and satellites. However, our sources are in a variety of languages (English, and German mostly). Where is the best option to translate everything into English?”

Michael provides insightful guidance on tackling the challenge of maintaining consistency in entity names across a multilingual landscape. He explores different strategies for translating entity names, weighing the pros and cons of various approaches. Whether to perform the translation at the source level, during the ETL (Extract, Transform, Load) process, or within the EDW itself, Michael offers considerations to help make an informed decision based on the specific needs and characteristics of the project.

The CEO emphasizes the importance of aligning with business objectives and ensuring that the chosen translation strategy aligns with the overall goals of the data warehousing initiative. This episode provides valuable insights and best practices for handling multilingual challenges in Data Vault projects, contributing to the success of your data integration and management endeavors.

Hiding Dimension Members in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke addresses a query from the audience, exploring the dynamics of managing data visibility in the DIMENSION information mart.

“How can a record be hidden in the DIMENSION information mart if it is no longer in use? Our Data Warehouse (DWH) features a hierarchy of region, division, and zone, which may undergo splitting or merging multiple times. The challenge is that the deleted event is not signaled from the source side, and only a full refresh captures new hierarchy information. Users desire a consistently current status reflected in both FACT and DIM tables.

1. To handle this, the current relation can be flagged and counted. This approach involves managing the relationship with a counter, allowing for effective tracking and visibility.

2. Additionally, the last relation needs to remain visible in the FACT table, ensuring that historical relationships are retained for reference.”

In this engaging video, Michael elaborates on these strategies, providing insights into maintaining data integrity and visibility within complex hierarchies, while accommodating changes and updates efficiently.

Sampling (DB Subsetting) Production Data in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke engages with a pertinent question from the audience, shedding light on best practices for structuring EDW environments.

“In one of the previous webinars (‘EDW Environments’), you mentioned about best practices for creating your EDW environments. Let’s consider a configuration where we have 4 environments, DEV + TST and PRE_PROD + PROD. Moreover, assume that the PROD environment is very heavy in the meaning of data volumes and we simply cannot handle such amounts of data on PRE PROD and TST (data on TST env. will be anonymized). Do you have any advice on how to create lightweight environments from PROD?”

In this insightful video, Michael delves into the complexities of managing EDW environments with varying data volumes. He offers practical advice on creating lightweight versions of the production environment for development, testing, and pre-production stages. The discussion encompasses strategies for data anonymization on the testing environment and optimizing resources to ensure efficiency across different stages of the EDW lifecycle.

Reference Tables With Effectivity Satellites in Data Vault

Watch the Video

In our continuous exploration of Data Vault concepts in the Data Vault Friday series, our CEO Michael Olschimke delves into an intriguing question posed by the audience.

“Do you use Effectivity Satellites also for Reference Data in Reference Satellites?”

This concise yet crucial inquiry prompts Michael to unravel the considerations and best practices associated with leveraging Effectivity Satellites in the context of Reference Data within Reference Satellites.

In this insightful video, Michael shares his expertise, discussing the potential applications and benefits of employing Effectivity Satellites for managing reference data. He sheds light on how this approach can enhance the flexibility and temporal aspects of Reference Satellites, contributing to a more robust and adaptable Data Vault architecture.

Data Vault on Databricks

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke addresses a pertinent question raised by the audience, unraveling the discourse around the compatibility of Data Vault 2.0 (DV2.0) with Databricks.

“There has been hype going on on LinkedIn about whether or not DV2.0 is suited to exist on Databricks. Many people disagree that it is. The most significant comments are ‘lots of joins,’ ‘performance getting data out,’ and ‘not suited for modern automation.’ The latter ties to tools creating generated code per object VS. parameterized pipelines.”

In this illuminating video, Michael delves into the discussions surrounding the suitability of Data Vault 2.0 in the Databricks environment. He provides insights into the concerns raised, such as the perceived challenges related to joint operations, data retrieval performance, and the alignment with modern automation practices.

Michael offers a balanced perspective, exploring the nuances of utilizing DV2.0 on Databricks and addressing the key considerations raised in the LinkedIn discussions.

Multi-temporal Source Data (Sap Hrms) in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke explores a valuable question from the audience, shedding light on the intricacies of modeling an SAP HRMS source with SCD type 2 data and dealing with time-dependent information in Data Vault 2.0.

“Could you please guide us on how to model an SAP HRMS Source that holds the data in SCD type 2 in the source itself with an effectivity start date and end date for each change? What will be the best way to deal with time-dependent data in Data Vault 2.0?”

In this enlightening video, Michael provides practical guidance on modeling strategies for incorporating SAP HRMS source data with Slowly Changing Dimension (SCD) type 2 attributes directly in the source. He addresses the complexities of handling time-dependent data within the Data Vault 2.0 framework, offering insights into the best practices for managing effectivity start and end dates for each change.

Michael shares valuable considerations and recommendations, providing a clear roadmap for efficiently handling time-dependent data scenarios in Data Vault 2.0 projects.

EDW Environments in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke addresses a crucial question from the audience that highlights a common challenge in data projects.

“I’m currently working on a project where the ‘environments’ (Dev, Prod, Test) are not well administrated. This topic is not mentioned at all in the DV2.0 methodology. Could you please elaborate on the roles of these environments and how to correctly use and manage them? As context, the problem faced at the moment by the company is that they’re not being able to test correctly and then implement. Also, the environments don’t necessarily count with the same information.”

In this insightful video, Michael provides a comprehensive discussion on the roles and importance of environments (Development, Production, Test) in the context of Data Vault 2.0 methodology. He addresses the challenges faced by the company, emphasizing the critical role that well-administered environments play in testing, implementing, and ensuring data consistency across different stages.

Michael shares practical insights into the correct utilization and management of environments, offering guidance on establishing a robust environment strategy within the Data Vault framework.

Close Menu