Skip to main content
search
0

Processing Streaming Data in Data Vault

Watch the Video

In our continuous Data Vault Friday series, Julian Brunner, our BI Consultant, takes on an intriguing question posed by an audience member.

“We are receiving real-time messages in JSON format. Are there any best practices, on how to model real-time data?”

In response to this query, Julian delves into the nuances of handling real-time data streams and outlines the recommended practices for modeling such data in the Data Vault framework. He shares insights on the architecture, methodologies, and considerations crucial for effectively incorporating real-time JSON messages into the Data Vault.

For those navigating the complexities of integrating real-time data seamlessly into their Data Vault setup, Julian’s expertise provides valuable insights and practical solutions, making this Data Vault Friday session a must-watch.

Stop Bleeding Money! 10 Steps to Save on Your Resource Consumption

Watch the Webinar

All organizations working with data platforms have the challenge to use their available budgets efficiently.

Especially in the new world where everyone is going to the Cloud, there are many new functionalities that give organizations the option of using their funds more efficiently than ever.

But on the flip side, it also gives everyone the possibility of scaling IT expenses endlessly, which can sometimes end in a huge monthly bill.

This webinar is a small collection of ways to create and maintain a cost-efficient data infrastructure. There will be 10+ tips on what your organization and your team can do, to reduce your monthly bill without any negative effects on your overall performance.

Watch Webinar Recording

Webinar Agenda

1. Intro
2. 10 Steps
3. Honorable Mentions
4. Summary

Loading CDC Data Into the Raw Data Vault

Watch the Video

In our recurring Data Vault Friday series, Michael Olschimke, our CEO, addresses a thought-provoking question about integrating the CDC data into the Raw Vault.

“A record from change data capture has a column with the type of operation (CREATE / DELETE / UPDATE) and multiple timestamp columns and it is inserted in the staging area. The record is going to be loaded in append-only (insert-only) in the Raw Data Vault. Are there attention points between CDC and Raw Vault to pay attention compared to a traditional loading of files in the staging area?”

Michael delves into the key considerations and nuances that distinguish the loading process of CDC data into the Raw Vault from the conventional method of loading files in the staging area. He sheds light on the potential attention points, highlighting crucial aspects to bear in mind during the append-only (insert-only) loading process.

For those grappling with the challenges of integrating CDC data seamlessly into the Raw Vault, Michael’s insights in this Data Vault Friday session provide valuable guidance and actionable tips.

Modelling Timesheet Data in Data Vault 2.0

Watch the Video

In the continued exploration of Data Vault concepts in our Data Vault Friday series, our CEO, Michael Olschimke, addresses a pertinent question from the audience concerning the integration of timesheet information into an internal Data Warehouse that already encompasses a Raw Vault for Resources, Projects, and Allocations.

“We are working on our internal Data Warehouse. We already have a Raw Vault for Resources (people in the organization), Projects, and Allocations (plans), but now we need to add the “timesheet” information. How should we model this, including the need to track changes?”

He provides actionable insights into structuring the Data Vault to accommodate timesheet data seamlessly, ensuring that historical changes are captured and enabling a robust system for tracking and managing timesheet-related information.

If you’re grappling with similar considerations in enhancing your internal Data Warehouse, Michael’s expertise in this Data Vault Friday session sheds light on best practices for modeling timesheet data within the broader Data Vault 2.0 architecture.

Typical Mistakes in Agile Approaches and How to Avoid Them

Watch the Webinar

In our webinar ‘Typical Mistakes in Agile Approaches’ we’ll explore the world of Agile Project Management, introducing Scrum as a powerful framework.

We’ll dive into the Data Vault 2.0 methodology for data integration in DWHs. Additionally, we’ll also discuss common mistakes when transitioning from Waterfall to agile approaches, including challenges specific to Data Vault and Scrum, offering practical guidance.

Join us to uncover common pitfalls and mistakes encountered in Agile Project Management and how to avoid them.

Watch Webinar Recording

Webinar Agenda

1. Get started with project management
2. Let’s get to know Scrum and agile project management and where are the pitfalls?
3. How does agile project management fits Data Vault 2.0?
4. How to avoid the Pitfall of not delivering business value

Seamless Agile Project Management in the BI Landscape

Agile project management has gained significant popularity in the corporate world due to its emphasis on collaboration, customer feedback, and continuous development. This approach has found its way into Business Intelligence (BI) projects, yet many companies still encounter common challenges or struggle to fully embrace agile methodologies.

A critical aspect of agile project management in BI involves effective collaboration among different teams. In today’s fast-paced business environment, projects often necessitate contributions from multiple departments or teams with diverse skill sets.
However, communication gaps, conflicting priorities, and differences in work styles can pose significant obstacles to smooth coordination and project success. Overcoming these challenges requires fostering open communication channels, establishing clear protocols for inter-team interactions, and cultivating a culture of mutual respect and understanding.

In the context of Data Vault and BI projects, several touchpoints demand collaboration from various departments or teams. For instance, gathering background information about the data source, ensuring privacy compliance, and obtaining well-defined user requirements are all critical components. Implementing well-defined processes, where everyone understands their responsibilities, can streamline these tasks. For example, ensuring privacy tagging precedes the implementation of the Raw Data Vault to ensure accurate Satellite splits. Additionally, clear requirements from users or reporting teams, in the form of User Stories or Question Stories, are fundamental for smooth project execution. In the following, you can visualize a high level example of an Development process of an Dashboard in a typical BI Project and the different Teams/People needed.

blank

Another common issue in BI projects is the excessive focus on technical aspects. Teams often dedicate significant time to building the entire technological infrastructure, neglecting the timely delivery of business value. At Scalefree, we advocate prioritizing business value from the project’s outset. We endorse the tracer bullet approach as an effective method, contingent on well-defined requirements. These issues are just a few examples of the challenges often faced in BI projects.

 

If you’re unfamiliar with the tracer bullet approach, don’t worry. We’ll dive into this topic and more in our webinar, “Typical Mistakes in Agile Approaches”. We also will take a look at Scrum and the Data Vault 2.0 methodology and uncover common mistakes in agile Project Management and learn how to avoid them. Watch the recording here for free!

OUR NEW MONTHLY EXPERT SESSION

To address the growing interest and challenges in agile project management, Scalefree is introducing “Insight Agile Projects,” a monthly expert session designed to address questions, share insights, and enhance collective knowledge in agile project management. Join us every 2nd Thursday of the month to gain valuable insights from our experienced project management experts, covering a range of topics including requirements analysis, effort estimation processes, stakeholder and people management, and their integration with the Data Vault 2.0 methodology.

Don’t miss the opportunity to enhance your project management proficiency with our expert guidance. Mark your calendars for the upcoming webinar and join our monthly expert sessions to gain invaluable insights into agile project management.

Data Mesh & Data Vault: Raw Data vs. Information

Watch the Video

In the ongoing Data Vault Friday series, our CEO, Michael Olschimke, addresses a pertinent question raised by the audience, focusing on the intersection of Data Mesh and Data Vault methodologies.

“In a data mesh/federation approach, can you share the raw data vault or the business vault across the enterprise or does the share only happen in the information mart? I believe it may happen some raw vault tables of a data domain to be joined with a raw vault of another data domain.”

Michael delves into the intricacies of Data Mesh and Data Vault methodologies, shedding light on the dynamics of sharing data across different domains within an enterprise. He provides insights into the scenarios where sharing may occur, discussing the potential of joining Raw Vault tables from disparate data domains. This elucidation aims to guide practitioners in navigating the nuances of data sharing in a federated environment.

Reference Hubs and Effectivity Satellites in Data Vault

Watch the Video

In the ongoing Data Vault Friday series, our CEO, Michael Olschimke, delves into two insightful questions raised by our audience about Reference Hubs and Effectivity Satellites.

“Should the Satellite hang from the Master Hub or Link? Our preference was to hang from the Hub as it kept the model simple and also to keep the CDC straightforward. Another reason we are leaning toward this is, that the descriptive columns belong to the Master Hub so it would be ideal to keep it as a satellite under the hub.”

“Now with the link, if there is any change in the relationship between the Master Hub Column to the Reference Hub Column in the link we would like to capture it. And it is via effectivity satellites. If we have One Master Hub column and a lot of the Reference Hubs columns do we end up in a lot of Effectivity Satellites? Or just one effectivity satellite as the driving key is the Master Hub column?”

Michael provides insights into these considerations regarding Reference Hubs and Effectivity Satellites, offering guidance on the optimal structuring of Satellites and managing Effectivity in Link relationships.

Monitoring of a Data Vault-Powered EDW

Introduction

In this newsletter, we’ll discuss an overview of the possible ways to monitor your solution within an Enterprise Data Warehouse (EDW) built using Data Vault 2.0.
The monitoring approaches below focus on giving an overview of different possibilities to grant insights into the performance of your data warehouse solution. The goal is to gain useful information about the technical data of your EDW and use them to track errors, find bottlenecks inside loading and transformation processes, and overall boost the performance of your data warehouse.

In a webinar session dived deeper into this topic of technical testing and monitoring methods for a Data Vault powered EDW. It covered all EDW layers and provided performance insights with a focus on modeling metrics vaults and marts, emphasizing source data.

Watch the recording here for free!

What to expect

You will receive an overview of different metrics and types of metadata to monitor your solution. One step further, the modeling approach for capturing and later analyzing the data is covered for different layers of your EDW, starting with the classical metrics vault and ending with the metrics marts. Additionally, different areas of monitoring are covered, focusing on potential challenges inside your solution. The main focus of this newsletter is how to provide metadata and therefore gain useful insights.

Modeling Metadata

Four instances are covered, which hold useful technical data. These are the metrics vault and the metrics mart built on top of it, the meta mart, and lastly the error mart. Each of these have their own use, although the modeling and data source may resemble one another. The similarities and differences between the modeling approaches of these mostly technical tables and their data source are explained in the below sections.

Metrics Vault

The metrics vault is used to catch defined metrics, which originate from ETL/ELT data flows. It’s not a mandatory component of Data Vault, however when implemented, your solution will gain several advantages from technical capabilities, such as:

  • Error inspection
  • Root cause analysis
  • Performance metrics

In terms of data flows, it focuses on identifying errors, as well as granting insights into the origins of these failures, in order to take actions to prevent these errors from happening again in the future.

The metrics themselves might originate from different sources. These sources may be the data platform, the ETL/ELT, or ESW job orchestration tool you are using. For this reason, we recommend modeling these data sources using the same patterns and standards you use to produce the Raw Data Vault, this will provide a scalable solution to capture and process your metrics.

Metrics Mart

Inside the metrics vault, the data is modeled and optimized for storage, scalability, and flexibility. To analyze the gathered data, the model is needed to fit the end-user’s requirements. For this matter, the metrics mart is part of the information delivery layer. As the metrics vault is built exclusively to provide data for the metrics mart, the standards of the Data Vault 2.0 architecture have been followed with this approach. The metrics mart is a special-purpose Information Mart which is sourced mainly from the metrics vault. It can be assumed that both materialization of marts as tables and virtualization as views can be considered and performed. The virtualization approach may be preferred, as deployment speed benefits from this decision.

As mentioned above, the main data source for the metrics mart is the metrics vault, although it’s not limited to it. In some cases, the business may decide to couple business objects with metrics to track the performance of specific business objects. This also helps to further boost performance.

The data model of the metrics mart is not fixed and varies due to the needs of the end-user. Therefore, a dimensional model could be the best solution, but it doesn’t have to be. In some cases, the data originates from the tools themselves, in such cases, the original structure of these tools could fulfill the purpose of analyzing the data more efficiently.

Meta Mart

Metadata is stored directly in the meta mart. These marts don’t source from the raw data vault, thus there is no such thing as a meta vault as a source for the marts to virtualize them. Therefore, the meta mart is materialized as tables. The model is similar to other information marts. For this matter, the business requirements imply the exact model of the meta mart, as its main function is to provide data in such a way that businesses can make the most use of it. In some cases, it might be a Dimensional Model, in other cases, it could be modeled in third-normal form.

Error Mart

The error mart is another information mart. Therefore, it also relies on the business as to how to exactly implement the model, although in practice most of them are modeled using a dimensional model. The goal of the error mart is to catch errors, so-called “ugly data”. This data is rejected by your transformation tools, for example rejected data rows from your soft rules. As soft rules change over time, so do your error marts. These errors don’t source from the raw data vault, resulting in the absence of an Error Vault. The data for the error marts can come from a variety of sources, although most of it originates from the ETL / ELT engine.

Managing Metadata

As the section above has described how to model specified information marts and what the main purpose of those are, this section focuses on the details of “what to load”. For this matter, a closer look at general data itself, its origin, and use case is taken at this point.

Business Metadata

Business Metadata is defined as metadata with meaning for the business. Although, it is not completely defined who is responsible for this data, as it depends on the business itself, the data warehouse team is responsible for the management of the data. The meta mart covers this type of metadata.

Some metadata you should consider tracking include the following:

  • Business column names
    As the business may decide to use abbreviations in prefixes or suffixes, which need to be tracked in order to be more coherent for the business.
  • Business definitions
    As these definitions are the foundation for analyzing data in later processes. It should be a business description of the different attributes and tables. It is also an indicator for the necessity of the attribute or table. If the business can’t provide information about the column/attribute or table, it should be considered as to whether it is necessary within an information mart or if it needs to be redefined within the business.
  • Ontologies and taxonomies
    As it provides information about the business object behind the source data, such as the classification of the business object, the relationship to other objects or hierarchies.
  • Physical table and column names
    As they provide information about the relation between the business object names and the names of the physical tables and attributes. This information is used to associate the data and the business objects.
  • Record source
    As it describes the different record sources in business terms. Therefore, its goal is not to provide a reference to technical instances such as databases. Furthermore, it aims to describe them in a manner that the business meaning behind those sources is provided.
  • Table specification
    As they describe the purpose of the source table and data as well as the column names and keys.
  • Exception-handling rules
    As they provide a list of potential technical issues, such as potential errors or data quality issues and how the ETL / ELT process handles these errors.
  • Source system business definitions
    As they describe the business meaning of source attributes.
  • Business rules
    As they generate new data based on business terms. Developers need some understanding of these rules and transformations in order to comprehend the requirements. As business metadata, the business rules are described in a way that the business can understand these. It should be considered to divide between hard rules and soft rules.

Technical Metadata

As business metadata serves the business the most, the data warehouse team is most likely to benefit from technical metadata, which is also stored in the meta mart. For this reason, most of technical metadata originates from technical components, resulting in the following examples:

  • Source systems
    As it provides a technical description of the source systems, such as database names or flat file locations.
  • Data models
    These are not typical metadata, although they are a helpful asset. They provide information about relationships between the used tables and could be presented in a graphical form.
  • Data definitions
    As they describe the data in a technical way. This includes information about the table name, column name, data type, constraints, or default values.
  • Business rules
    Business rules are also a part of technical metadata. Here, specifically, the technical definitions of these rules are considered, as they need to be implemented in the ETL / ELT tools later on. Hard rules and soft rules should be separated, just as in the business metadata.
  • Volumetrics
    As they describe the evolution of data loads, table sizes, and growth patterns in order to estimate the workload of the data warehouse.
  • Ontologies and taxonomies
    Technical metadata also should provide information about ontologies and taxonomies, like abbreviations of terms and attributes, relationships, business key designations, parents, and hierarchies as well as re-defines.
  • Data quality
    This metadata provides information about data quality metrics, which may include standardization of source data.

Process Execution Metadata

Process execution metadata is not provided by the business or source applications like business metadata and technical metadata, but furthermore is generated by the data warehouse team to provide insights into the ETL / ELT processes. The main goal is to provide information for a better understanding of the data warehouse performance and thus potentially increasing this performance. Most of the data comes from the ETL / ELT tool you are working with, delivering the following metadata:

  • Control flow metadata
    As a control flow executes at least one data flow, it provides data about processes and data lineage of all elements of the data warehouse.
  • Data flow metadata
    As they provide data about the data flows themselves. This includes the data volumes and rejected rows of a data flow / data transformation.
  • Package metadata
    The package metadata contains a summary of the information about running a package, which usually executes a control flow.
  • Process metadata
    Most packages are executed by a scheduling application. The process metadata provides data about the process that has started the package.

The process execution metadata should be stored in the metrics vault and kept separated from the business metadata and technical metadata.

Conclusion

In this newsletter, we provided an overview of monitoring options for your Data Vault-powered EDW.
We covered different types of information marts, including their modeling approach, their data sources, and usage for businesses and their data warehouse team.
We learned about different types of metadata, their source and which purpose they serve. For this matter, we divide between business metadata, that describes your data in terms, that business can make the most use out of it, technical metadata, that holds technical information about the processed data, and process execution metadata, which provides information about your data processes and ETL / ELT pipelines.
Finally, this newsletter provides information about how and what metadata you can process to monitor your EDW according to the standards of Data Vault 2.0.

We will go even deeper in our webinar. Make sure to watch the recording for free! We are looking forward to seeing you there.

Natural Key vs. Technical ID in Data Vault

Watch the Video

In our continuous exploration of Data Vault concepts during the Data Vault Friday series, our CEO, Michael Olschimke, takes on a thought-provoking question from a member of our audience regarding natural keys.

“Can a work order ID or recommendation ID be considered a natural business key even though there is no business ‘meaning’ to these ids? or should I use the text description of the recommendation as the business key? The work order would have to be a composite key made up of details and a date for the work order which at that point seems like just using the ID makes more sense.”

Michael addresses the nuanced concept of Natural Key and Technical ID in the context of work order and recommendation IDs. The discussion provides valuable insights into making informed decisions about choosing business keys in scenarios where the identifiers lack inherent business meaning.

Snackable Data Insights: Analytics Meets Salesforce Data Cloud

Watch the Webinar

What are we going to do with all that data? Something you might ask yourself when you are checking the current size of your data lake or data warehouse.

In our webinar, we will explore together how we can make use of our data residing in Snowflake using Salesforce Data Cloud or Tableau. Extract value from your data now!

Join us to learn more about the newest analytics tools inside the Salesforce ecosystem and how you can use them in your data architecture.

Watch Webinar Recording

Webinar Agenda

1. Recap DDL ’23
2. Ask the Right Questions
3. Issues
4. (New) Tools

HL7 FHIR resources in Data Vault

Watch the Video

In our continuous exploration of Data Vault concepts during the Data Vault Friday series, our CEO, Michael Olschimke, takes on a compelling question posed by a member of our audience.

“How would you model data that is transmitted as HL7 FHIR resources in Data Vault?”

Michael delves into the intricacies of handling HL7 FHIR resources within the Data Vault framework. This session is a valuable resource for those seeking insights into the effective modeling of healthcare data encoded in HL7 FHIR resources.

Close Menu