Skip to main content
search
0

Bridge Table and Zero Code Impact in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke addresses a pertinent question from our audience.

“We are currently implementing a bridge table over a series of sprints. The table prepares a fact entity with many measure values that are added sprint by sprint. Some measures are based on other measures in the bridge table. Our issue is that the code to load the bridge table is already complex due to the many measures. It exceeds 800+ lines of code and requires constant reengineering when additional measures are added. Is there a more agile approach with less, maybe zero change impact on the existing code?”

In this insightful video, Michael explores strategies for building a bridge table in an agile and incremental fashion. The question prompts a discussion on addressing the complexity of the loading code and finding approaches that minimize change impact, ensuring a more flexible and adaptive development process.

The video offers practical insights and recommendations for streamlining the implementation of a bridge table, enhancing agility, and reducing the challenges associated with code maintenance in evolving data models.

Boost ROI of Data Infrastructure with Automation

Watch the Webinar

Generating returns from a modern data infrastructure is tough. First, creating a central repository for easy data access requires much upfront, traditionally manual work to set up data ingestion, mapping, metadata management, etc. Changes in sources, tech stack, and taxonomies require more work. Or someone new comes on board and proposes building an entirely new model to answer the same business question. Typically, all this pushes the data team to take shortcuts to regain lost time, creating technical debt. In this webinar, we’ll explain how automation done right, following Data Vault 2.0 standards, will not only cut manual work but solve problems of agility, uncertainty, and output quality, to ultimately provide the return you expect. Learn about what can go wrong — and how to get it right.

Watch Webinar Recording

Webinar Agenda

1. Common pitfalls in data management.
2. How the problems were solved in the past: what worked and what didn’t
3. How Data Vault methodology combined with automation brings new solutions…
4. … And how this will save you time, and money.

Zero Key Concepts in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our trainer Marc Finger delves into an intriguing question posed by the audience.

“In Hubs, we add two ghost records: one with 0s (unknown/zero key) and another with f’s (sometimes called error key). In the loading of the stage, in which cases should we replace the generated hash key with the error key instead, and how? Right now, if the Business Key (BK) or combination of BKs is null, we are always replacing it with the zero key. My question is in which cases should we use the ffff key instead.”

In this informative video, Marc explores the usage and value of zero keys when loading links within the Data Vault framework. The question prompts a discussion on the considerations and scenarios where replacing the generated hash key with the error key, represented by ‘ffff,’ is beneficial.

The video provides practical insights and recommendations for optimizing the handling of ghost records and error keys, contributing to a more robust and efficient Data Vault implementation.

Realtime Architecture in Data Vault

Watch the Video

In our continuous Data Vault Friday series, our CEO Michael Olschimke addresses a thoughtful question from our audience.

“What additional steps are there in a Real-Time loading pattern on top of the batch loading pattern?”

In this concise yet informative video, Michael focuses on the nuances of incorporating real-time loading patterns into the Data Vault 2.0 architecture. The question prompts a discussion about the specific steps that distinguish real-time loading from the traditional batch loading pattern.

Michael shares insights into the additional considerations and steps required to ensure the effectiveness of real-time data integration. The discussion provides valuable guidance for those looking to enhance their understanding of real-time loading within the context of the Data Vault 2.0 framework.

How to Get Data Out of Data Vault

Watch the Webinar

Data Vault is a very flexible model when it’s about creating a scalable data warehouse design. This is due to splitting the data into 3 basic entities: keys, relationships, and descriptive data. But, the result is also a bigger model with more entities than in a 3rd normal form (3NF) or star schema model. A common complaint is that it is difficult and inefficient to query the data from the Data Vault.
In this Webinar we will show you the opposite and what’s needed to accomplish this.

Watch Webinar Recording

Webinar Agenda

1. Data → Information → Business Value
2. Requirement gathering
3. PITs and bridges
4. Information marts

Realtime Delta Application in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke tackles a compelling question from our audience.

“From one of our source systems, we receive customer information in real-time messages. These messages are set up in a way that they always contain the business key(s) to identify the customer and the attribute(s) that have been entered, changed, or set to null.

For example, the first message includes customer creation with various attributes, including an email address and, of course, a Business Key. A second message later might include the Business Key, a new email address, and a new attribute (e.g., birthplace) that did not exist in the previous one.

We have a list of attributes that the end user is interested in, in our Information Mart, specifically in the Customer dimension. How would you manage this in the Data Vault architecture?”

In this enlightening video, Michael begins by discussing the design often encountered in such scenarios, highlighting areas for potential improvement. He then shares his insights on how to efficiently manage the message stream within the Data Vault 2.0 model. The discussion touches on optimizing the design for better usability in the Information Mart while maintaining the integrity and efficiency of the Data Vault architecture.

Top 10 Salesforce Features – 2022 (German)

Watch the Webinar

In diesem Webinar schauen wir uns unsere Top 10 besten Salesforce Features in 2022 an. Egal ob Neuheiten oder Änderungen aus den großen 3 Updates dieses Jahres. Wie kann man diese Features nutzen und was ist das besonders Interessante daran?

Lernen Sie mit uns die neuen Features aus dem Jahr 2022 kennen.

Watch Webinar Recording

Webinar Agenda

1. Übersicht Salesforce Jahr
2. Top 10 Features
3. Fazit 2022

Same-as-links Business Rules in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke engages with a thought-provoking question from our audience regarding Same-as-Links.

“Sometimes mapping logic for Same-As-Links (SAL) requires complex ‘fuzzy’ business logic. When does the logic become too complex for the Raw Data Vault, and instead, the joining of similar tables from different sources becomes a Business Vault concern? It’s important to not have convoluted transformations in the Raw Data Vault, so where do we ‘draw the line’ on transformations being too convoluted/complex for a Raw Data Vault entity?”

In this enlightening video, Michael addresses the delicate balance between the Raw Data Vault, where no business logic is applied, and the Business Vault, where most business logic is implemented. He provides insights into recognizing when the mapping logic for Same-As-Links (SAL) becomes too intricate for the Raw Data Vault, prompting the shift to the Business Vault for handling complex transformations.

The discussion offers practical considerations and a clear perspective on drawing the line to maintain the efficiency and clarity of transformations within the Raw Data Vault.

Pivotizing Fact Measures in Data Vault

Watch the Video

In our continuous Data Vault Friday series, our CEO Michael Olschimke engages with a pertinent question from our audience.

“There are 6 measure values (float/decimal values) in the fact entity. In each row, typically 3 of them are NULL. Would it make sense to unpivot the data and encode this in a dimension for measure type? We also have measure values which are based on integers. Does it make sense to separate them into their own fact entity?”

In this insightful video, Michael delves into the considerations surrounding the structure of fact entities when dealing with multiple-measure values. The specific scenario of having null values for some measures prompts a discussion on whether it is beneficial to unpivot the data and encode it in a dimension for measure type. Additionally, Michael explores the case of measuring values based on integers and evaluates whether separating them into their own fact entity is a sound approach.

The video offers practical guidance and best practices for optimizing the design of fact entities in Data Vault models, ensuring efficiency and clarity in data representation.

About Information Marts in Data Vault 2.0 – Part 2

In the Data Vault 2.0 architecture, information marts are used to deliver information to the end-users.
Conceptually, an information mart follows the same definition as a data mart in legacy data warehousing. However, in legacy data warehousing, a data mart is used to deliver useful information, not raw data. This is why the data mart has been renamed in Data Vault 2.0 to better reflect the use case.

“Classical” information marts

But the definition of information marts has more facets. In the book “Building a Scalable Data Warehouse with Data Vault 2.0” we present three types of marts:

  • Information marts: used to deliver information to business users, typically via dashboards and reports.
  • Metrics Mart: used in conjunction to a Metrics Vault, which captures EDW log data in a Data Vault model. The Metrics Mart is derived from the Metrics Vault to present the metrics in order to analyze performance bottlenecks or in resource consumption of power users and data scientists in managed self-service BI solutions.
  • Error Mart: stores those records that typically fail a hard rule when loading the data into the enterprise data warehouse.

Additional information marts

In addition to these “classical” information marts, we use additional ones in our consulting practice:

  • Interface Mart: this is more or less an information mart, however, the information is not delivered to a human being, e.g. via a dashboard or report. Instead, it is delivered to a subsequent application, or as a write-back, to the source system (for example when using the enterprise data warehouse for data cleansing).
  • Quality Mart: the quality mart is again an information mart, but instead of cleansing bad data, it is used to report bad data. Essentially, it turns the business logic used to cleanse bad data upside down: only bad data, in addition to ugly data at times, is delivered to the end-user, the data steward. This is often done in conjunction with data cleansing frontends where the data steward can either correct source data or comment and tag the exceptions.
  • Source Mart: again an information mart, but this time not using one of the popular schemas, such as star schemas, snowflake schemas or fully denormalized schemas. Instead, the information mart uses the data model of the source application, similar to an operational data store (ODS) schema. However, the Source Mart is not a copy of the data, it is a virtualized model on top of the Data Vault model, reflecting the original structures. It’s great for ad-hoc reporting and provides great value for many data scientists as well as power users.

This concludes our list of information marts. We have used them successfully in projects for our clients to better communicate the actual application of the information marts in their organization.

Let us know in the comments if you think this is helpful for you, too!

Redundancy in Dimensional Models in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke addresses a thought-provoking question from our audience.

“A company might have many assets. The asset dimension contains many descriptive fields describing the company, leading to redundancy. When does it make sense to separate the attributes into their own dimension? If a dimension attribute is often used in filtering, does it make sense to separate this into its own dimension?”

In this enlightening video, Michael delves into the considerations and decision-making process surrounding the design of dimensional models derived from a Data Vault model. Specifically, he explores the scenario where the asset dimension contains numerous descriptive fields, potentially leading to redundancy. Michael provides insights into when it makes sense to separate these attributes into their own dimension and discusses the factors influencing this decision.

Furthermore, the discussion extends to instances where a dimension attribute is frequently used in filtering and whether it warrants a separate dimension. Michael’s explanation offers practical guidance and considerations for optimizing the target models while managing redundancies effectively.

For those involved in data modeling and dimensional design within the Data Vault framework, this video provides valuable insights and strategic considerations.

Hash Keys vs Sequence Keys vs Business Keys in Data Vault

Watch the Video

In our continuous Data Vault Friday series, our CEO Michael Olschimke engages with a pertinent question posed by our audience.

“How do the loading cycles benefit from joining on a hash key or a Business Key, as opposed to a surrogate value?”

In this insightful video, Michael delves into the critical aspect of designing the Enterprise Data Warehouse (EDW) by examining three choices for identifying records in the Data Vault model. The specific focus is on the advantages and implications of joining on a hash key or a Business Key, contrasting these approaches with the use of a surrogate value.

Michael’s comprehensive exploration provides clarity on the impact these choices have on loading cycles within the EDW. By understanding the nuances of each option, viewers gain valuable insights into optimizing loading processes and achieving efficient data integration.

For those involved in the design and management of Data Vault models, this video offers practical considerations and strategic insights.

Close Menu