Skip to main content
search
0
All Posts By

Michael Olschimke

Michael Olschimke is the Co-Founder and CEO of Scalefree and a "Data Vault 2.0 Pioneer" with over 20 years of IT experience. A Fulbright scholar and co-author of Building a Scalable Data Warehouse with Data Vault 2.0, Michael is a global authority on AI, Big Data, and scalable Lakehouse design across sectors like banking, automotive, and state security.

Modelling Currencies in Non-Historized Links in Data Vault (PART 1)

Watch the Video

As part of our ongoing Data Vault Friday series, our CEO Michael Olschimke engages with a thought-provoking question posed by a member of our audience.

“Is it a good idea to create a HUB for an ‘opposite side Account’? Or maybe we should go even further and try to merge the ‘opposite side Account’ with HUB_ACCOUNT? If yes, what about different IBAN formats in different countries? Do we really want to have accounts from all over the world in our HUB_ACCOUNT?”

In this informative video, Michael tackles the complexities surrounding the creation of a HUB for the “opposite side Account” and explores the possibility of merging it with HUB_ACCOUNT. Delving into the practical considerations of accommodating different IBAN formats across countries, he provides nuanced answers and considerations to guide the decision-making process.

For those grappling with the challenges of data modeling in the context of global account structures, this video offers valuable insights and practical solutions.

Load Date vs Snapshot Date in Data Vault

Watch the Video

In our continuous Data Vault Friday series, our CEO Michael Olschimke delves into a pertinent question posed by our audience about the use of a Snapshot Date.

“What is the purpose of the Snapshot Date and how does it relate to the load date?”

In this enlightening video, Michael explores the significance of both the load date (TS) and the snapshot date (TS) within a well-designed Data Vault 2.0 architecture. Acknowledging their crucial roles as timelines, he provides a clear and insightful explanation of how these dates function and interrelate in the context of Data Vault 2.0.

Understanding the nuances of these temporal elements is key to optimizing data management within the Data Vault framework, and Michael’s explanation offers valuable insights for both newcomers and experienced practitioners in the field.

Modeling Invoices in Data Vault

Watch the Video

As part of our ongoing Data Vault Friday series, our CEO, Michael Olschimke, engages with a relevant and practical question from our audience about Data Vault modeling.

“What are the best practices for modeling Data Vault table structure to store invoice data?”

In this concise yet informative video, Michael shares valuable insights into the best practices for designing Data Vault table structures specifically tailored for storing invoice data. Recognizing the importance of effectively modeling this type of data, Michael addresses key considerations, and potential challenges, and recommends optimal approaches to ensure a robust and scalable solution.

For those seeking guidance on structuring Data Vault tables for invoice data, this video serves as a quick and insightful resource.

Real-Time Loading of CDC Packages in Data Vault – PART 2

Watch the Video

As part of our ongoing Data Vault Friday series, our CEO Michael Olschimke delves into a pertinent question posed by a member of our audience.

“I would be interested in some ideas about how to load data from Apache Kafka. In our case, we receive CDC data from DB servers over Apache Kafka.

One specific concern raised is about maintaining the correct sequence of data in Raw Vault when dealing with different partitions in a Kafka topic. This becomes particularly crucial in scenarios involving Change Data Capture (CDC) from database servers.”

In this enlightening video, Michael provides insightful ideas and strategies for effectively loading data from Apache Kafka while ensuring the integrity of the sequence in the Raw Vault. He tackles the nuances of handling different partitions within a Kafka topic, offering practical guidance to address challenges associated with maintaining data order.

For those navigating the intricacies of data loading from Apache Kafka, this video provides valuable insights and solutions.

Logical Industry Models in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke addresses a thought-provoking question raised by a member of our audience.

“In our data warehouse architecture, we have integrated a Data Vault 2.0 approach with a logical industry model: once the raw data has been loaded into the Raw Data Vault, the data undergoes a transformation into the Business Vault. This follows the logical design, maintaining the Data Vault style but derived from the industry model. Subsequently, it is further transformed into the business access layer, forming an information mart.

However, despite the initial intent of adopting a best-of-breed approach, there’s a realization that somewhere along the way, agility was compromised.”

In this enlightening video, Michael delves into the challenges faced in a data warehouse architecture that combines a Data Vault 2.0 approach with a logical industry model. Specifically, he addresses the placement of logical vendor models within this framework, exploring ways to maintain agility in the process.

Real-Time Loading of CDC Packages in Data Vault – PART 1

Watch the Video

In the latest installment of our Data Vault Friday series, our CEO Michael Olschimke addresses a pertinent question posed by an audience member.

“I would be interested in some ideas about how to load data from Apache Kafka. In our case, we receive CDC data from DB servers over Apache Kafka.

Should the data be converted from AVRO/JSON format to database format in Staging / Raw Vault? Or should it be loaded directly in an unchanged format? What is the Best Practice here?”

In this insightful video, Michael provides practical guidance on loading data from Apache Kafka, specifically when dealing with CDC (Change Data Capture) information from database servers. He explores the options of converting data from AVRO/JSON format to a database format within Staging/Raw Vault versus loading it directly in its original unchanged format.

For those navigating the complexities of data loading from Apache Kafka, this video offers valuable insights and best practices to inform decision-making in data architecture.

NULL Business Keys in Data Vault

Watch the Video

In our continuous Data Vault Friday series, our CEO, Michael Olschimke, takes a moment to delve into a thought-provoking question raised by our audience.

“Does the Business Key should be a not null column at the source?”

This succinct yet critical query is the focus of this brief but insightful video. Michael engages with the nuances of the business key, exploring whether it should be a mandatory, not-null column at the source. As he unpacks the considerations, the audience gains valuable insights into the implications and potential advantages of enforcing the not-null constraint on the business key.

For those seeking clarity on best practices surrounding business key management, this video provides concise guidance.

Managed Self-Service Industrialization in Data Vault

Watch the Video

As part of our continuing Data Vault Friday series, our CEO, Michael Olschimke, engages with a pertinent question posed by our audience.

“How does the industrialization work in Managed Self-Service BI?”

In this succinct yet informative video, Michael delves into the intricacies of the industrialization process within the realm of Managed Self-Service Business Intelligence (BI). The audience is treated to a valuable discussion on the methodologies and practices involved in streamlining and scaling BI processes for efficient and consistent outcomes.

Michael sheds light on the significance of industrialization in the context of Self-Service BI scenarios, providing insights that are relevant for both beginners and seasoned professionals in the field.

Sharding in Data Vault 2.0

Watch the Video

In our continuing Data Vault Friday series, our CEO, Michael Olschimke, engages with an intriguing question posed by our audience.

“How does sharding work in Data Vault 2.0?”

In this illuminating video, Michael takes us on a journey to explore the intricacies of sharding within the context of Data Vault 2.0. Delving into the technical aspects, he provides insights into the process of laying out data on a Massively Parallel Processing (MPP) cluster. Interestingly, Michael shares his expertise from the comfort of his personal MPP cluster located in his home basement, adding a unique and practical dimension to the discussion.

For those seeking a deeper understanding of sharding techniques and their implementation in Data Vault 2.0, this video serves as a valuable resource.

Lambda Architecture vs. Data Vault 2.0 Architecture

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke engages with a question that delves into the comparison between the Data Vault 2.0 architecture and the Lambda architecture for real-time systems.

“We are currently comparing the Data Vault 2.0 architecture with the Lambda architecture for real-time systems. Can you elaborate on the similarities and differences?”

In this enlightening video, Michael provides a comprehensive exploration of the distinctions and commonalities between the Lambda architecture and the Data Vault 2.0 architecture. The audience gains valuable insights into the considerations, strengths, and potential use cases of each approach, aiding in informed decision-making for real-time system implementations.

Overloaded Links in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke addresses a thought-provoking question raised by our audience regarding data warehouse management.

“In our data warehouse currently, there are 3 hubs. Now, business users are seeking new information from another table, one that contains the results of tests conducted on the business objects represented by the 3 hubs. This new table features foreign keys corresponding to the 3 hubs. Interestingly, per row, only 1 foreign key is filled, followed by the associated test results.

To address this scenario, we considered attaching a satellite to each of the hubs and populating it with data only if the relevant foreign key is set. Another option we explored was modeling it as a link between the three hubs. However, given that there is no other table depending on it, we are inclined to lean towards option 1, attaching satellites to the hubs.”

This nuanced discussion on handling data relationships and the potential risks associated with overloading links is further explored in detail in the accompanying short video.

User Spaces in Managed Self-Service BI in Data Vault

Watch the Video

In our engaging Data Vault Friday series, our CEO, Michael Olschimke, delves into an intriguing question posed by our audience.

“What is the purpose of a user space? Are there any variants?”

In this insightful video, we explore the intricacies of the user space concept within the realm of managed Self-Service Business Intelligence (BI). Michael Olschimke provides valuable insights and clarification, shedding light on the significance of user spaces and addressing potential variations that exist.

Whether you’re new to the concept or seeking a deeper understanding, this short yet informative video serves as a valuable resource in comprehending the role and nuances of user spaces in the context of BI.

Close Menu