Skip to main content
search
0

Multi-Active Satellites vs. Dependent Child Links in Data Vault Modeling

Multi-Active Satellites vs. Dependent Child Links

In the realm of Data Vault modeling, practitioners often encounter scenarios where multiple descriptive entries are valid simultaneously for a single business entity. Two primary modeling techniques address this complexity: Multi-Active Satellites (MAS) and Dependent Child Links. Understanding the distinctions between these approaches is crucial for designing efficient and accurate data warehouses.



Understanding Multi-Active Satellites

A Multi-Active Satellite is designed to store multiple instances of descriptive information related to a parent key, all valid at the same point in time. The parent can be either a Hub or a Link. This structure is particularly useful when an entity can have several concurrent attributes.

For example, consider an insurance policy that offers various coverage details, each with different validity periods. Here, the policy (Hub) is associated with multiple coverages, each represented as a row in the Multi-Active Satellite, capturing the distinct validity periods and coverage amounts.

Defining Dependent Child Links

A Dependent Child Link is a Link entity that includes one or more additional key attributes. Together with the combination of business keys connected by the Link, these attributes uniquely identify incoming data records. This structure is also known as a degenerated link, peg-legged link, non-historized link, or transactional link.

For instance, in an invoicing system, an invoice (Hub) may have multiple line items. Each line item can be uniquely identified by combining the invoice identifier with a line item number, forming a Dependent Child Link.

Modeling Example: Order Line Items

When modeling order line items, there are two valid approaches:

  1. Create a non-historized, Dependent Child Link with a Non-historized Satellite containing the invoice footer details.
  2. Establish a non-historized, Dependent Child Link that includes both the key combination and the invoice footer details.

The choice between these methods depends on the specific requirements of the data model and the nature of the data being captured.

Modeling Example: Insurance Policies

Consider an insurance policy with different effective time windows—a scenario discussed in a previous session. In this case, attributes such as ValidFrom, ValidTo, and Amount are descriptive data attributes related to the business relationship between the Policy and Coverage.

The recommendation is to keep these attributes together in a Multi-Active Satellite on a standard Link between Policy and Coverage. This approach ensures that all relevant information is stored cohesively, allowing for efficient querying and analysis.

Choosing Between Multi-Active Satellites and Dependent Child Links

The decision to use a Multi-Active Satellite or a Dependent Child Link hinges on the specific business scenario:

  • Multi-Active Satellites are ideal when multiple descriptive attributes of an entity are valid simultaneously, and these attributes need to be tracked over time. This structure allows for capturing the history of changes effectively.
  • Dependent Child Links are suitable when there is a need to uniquely identify records through a combination of keys, especially in transactional contexts where multiple related records exist, such as invoice line items.

It’s essential to assess the nature of the data and the business requirements to determine the most appropriate modeling technique.

Conclusion

Both Multi-Active Satellites and Dependent Child Links offer valuable structures in Data Vault modeling, each catering to different scenarios involving multiple concurrent records. By understanding their definitions, applications, and differences, data modelers can make informed decisions to design robust and efficient data warehouses.

Watch the Video

Meet the Speaker

Trung Ta Senior Consultant

Trung Ta
Senior Consultant

Trung has been Senior BI Consultant since 2019. As a Certified Data Vault 2.0 Practitioner at Scalefree, his area of expertise includes Data Warehousing in a cloud environment, as well as Data Vault 2.0 modeling and implementation – especially, but not limited to, with Wherescape 3D/RED. He’s been working with industry leaders in the insurance and finance sector, advising them on building their own Data Vault 2.0 solution.

Interview with Julien Redmond, Creator of IRiS

Interview with Julien Redmond

Welcome to another edition of Data Vault Friday! I’m Michael Olschimke, CEO of Scalefree, and every Friday at 11 o’clock, we dive into discussions about Data Vault, data mining, cloud computing, and any data-driven applications. Today, we have a special guest—Julien Redmond from Ignition Data in Australia, who’s been working with us as a partner. Julien has developed the IRiS Data Vault automation tool, and he’s here to share insights about this innovative solution.



The Global IRiS Tour

Julien has been traveling the globe, promoting IRiS and ensuring that everyone knows about this groundbreaking tool. IRiS focuses on simplifying the data engineering aspects of Data Vault, rather than the modeling tasks, making it accessible and easy to use. Julien’s goal was to create a process so straightforward that anyone could learn it in less than a day. This simplicity allows teams to make Data Vault tasks repeatable and manageable, even for junior members.

What Sets IRiS Apart?

With so many Data Vault automation offerings available, IRiS stands out by addressing common pain points. The tool aims to minimize the steep learning curves often associated with other automation tools and facilitates seamless knowledge transfer between experienced and new users. It’s designed to integrate softly with existing data management platforms—whether that’s Microsoft Data Factory, AWS Glue, or other established tools—without disrupting current systems.

Seamless Integration

IRiS requires a minimal amount of metadata, which can be easily extracted from any modeling tool. This means there’s no new modeling interface to learn—just feed the metadata into IRiS, and it generates the necessary stored procedures and data definition scripts. This integration approach ensures that companies can leverage their existing platforms while adding powerful Data Vault automation capabilities.

Empowering Data-Driven Organizations

IRiS supports a range of target platforms like Databricks, Snowflake, and Microsoft tools, aligning with the growing trend of moving towards Lakehouse architectures. Organizations can incrementally move data into the Lakehouse based on specific use cases, promoting value-driven design and delivery. Julien emphasized that IRiS is lightweight, inexpensive, and comes as a single container—making it easy to deploy and use without significant overhead.

Learning and Community Support

One of the standout features of IRiS is its supportive learning environment. It includes an online training program with six hours of videos, a comprehensive playbook blending Data Vault methodology with practical user guidance, and access to a knowledge hub with tips and tricks. New users can get up to speed quickly, reinforcing their learning with a supportive community ready to help when needed.

Future of IRiS

Julien’s global tour reflects the excitement and confidence behind IRiS. As he visits partners worldwide—from Finland to the US—he’s spreading the word about how IRiS can transform Data Vault engineering, especially for organizations invested in cloud platforms. The response so far has been overwhelmingly positive, with teams appreciating how IRiS fits into their existing infrastructures while simplifying their workflows.

That wraps up this special session of Data Vault Friday! Thanks for joining us, and a big thanks to Julien for sharing his journey with IRiS. We’ll return to our usual Q&A format next time, so be sure to bring your questions. Until then, have a fantastic weekend!

Watch the Video

Meet the Speaker

Profile picture of Michael Olschimke

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

Microsoft Fabric as an Enterprise Data Platform

Intelligent Data Front-ends

Introduction to Microsoft Fabric and dbt Cloud

In today’s digital world, organizations need a unified, scalable, and collaborative data platform to power analytics, AI-driven insights, and business intelligence. Enter Microsoft Fabric—a comprehensive, role-based, SaaS-delivered data platform that brings together key Azure services under a single “one lake” foundation with built-in AI capabilities.

In this article, we’ll explore how Microsoft Fabric can serve as your enterprise data platform, how it integrates with data modeling tools like dbt Cloud, and a proven “medallion” reference architecture that takes you from raw data ingestion to business-ready information marts. We’ll also discuss future extensions, practical limitations, and best practices to guide your journey.

Microsoft Fabric as an Enterprise Data Platform

This webinar covers leveraging Microsoft Fabric to implement a modern, end-to-end data platform. You will learn, how the different Fabric services can be combined, to implement a medaillon architecture, supported by Data Vault 2.0 and dbt Cloud. A live demo will show lakehouses, warehouses, and Hub, Links, and Satellites in a real world scenario!

Watch webinar recording

Quick Primer: The Data Vault Methodology

Before diving into Fabric, it’s helpful to understand the Data Vault approach—an architecture pattern that brings agility, auditability, and scalability to your data warehouse. It comprises three core components:

  • Business Keys: Unique identifiers of business objects (e.g., customer number in a CRM).
  • Descriptive Data: Attributes that describe business keys (e.g., customer name, birthdate), which evolve over time.
  • Relationships: Linkages between business keys (e.g., customer–order relationships in a CRM).

By separating these elements into hubs, satellites, and links, Data Vault provides a repeatable, auditable framework for loading and tracing data lineage, perfectly suited for modern cloud platforms.

hubs, links, and satellites

Microsoft Fabric: Core Front-Ends and Services

At its heart, Microsoft Fabric brings together seven role-based “front-end” experiences, but three of them are key to enterprise data engineering and warehousing:

Data Factory

  • Data Flows: Low-code transformations (joins, aggregations, cleansing) via a Power Query-like interface.
  • Data Pipelines: Petabyte-scale ETL/ELT workflows with full control-flow constructs (if/else, loops).

Use case: Ingest raw data from relational, semi-structured, or unstructured sources into your landing zone lakehouse.

Data Engineering

  • Lakehouses: Unified storage for structured/unstructured data in Delta-Parquet format, with SQL endpoints for analytics.
  • Notebooks: Interactive Python, R, or Scala environments for data prep, analysis, and data science exploration.
  • Spark Job Definitions: Batch and streaming ETL jobs on Spark clusters.
  • Data Pipelines: Orchestrated sequences of collection, processing, and transformation steps.

Use case: Land raw data and expose it to data scientists or further transformation processes.

Data Warehouse

  • Warehouses: Relational-style databases with Delta-Parquet storage, instant elastic scale, and full transactional support.
  • Support for cross-warehouse queries and seamless read access to lakehouses.

Use case: Implement Data Vault’s Raw Vault, Business Vault, and Information Marts for BI consumption.

Intelligent Data Front-ends

Workspaces

All Fabric resources live inside workspaces, which group lakehouses, warehouses, notebooks, pipelines, and more. Workspaces enable:z

  • Role-based access control and collaboration
  • Integration with Git for versioning and CI/CD
  • Cross-workspace data access via shortcuts

Integrating dbt Cloud with Microsoft Fabric

dbt Cloud is an industry-leading transformation framework that brings software engineering best practices to your data models: modular SQL, testing, documentation, and CI/CD. In Fabric, dbt Cloud:

  • Connects to a Fabric workspace as a data warehouse endpoint
  • Generates SQL models (SELECT statements), reading from lakehouses or warehouses
  • Executes those models natively on Fabric warehouses

Key benefit: dbt manages your Data Vault layers (hubs, links, satellites, and information marts) with clear lineage, testing, and version control—while Fabric handles execution, storage, and compute elasticity.

Reference Architecture: The Medallion Approach on Fabric

The modern “medallion” architecture separates data into three refinement layers—Bronze (raw), Silver (conformed/business), and Gold (BI-ready). Here’s how it maps onto Fabric:

Bronze (Landing Zone Lakehouse)

Data Factory pipelines copy raw relational, JSON, and unstructured files into a lakehouse. This fully persisted, immutable history remains read-only for most users.

Silver (Raw & Business Vault Warehouses)

  • Raw Vault Warehouse: dbt models generate staging views/tables with hash keys, load dates, and audit metadata.
  • Business Vault Warehouse: dbt builds hubs, links, and satellites based on business keys and relationships.

Gold (Information Mart Warehouse)

Information marts—star or snowflake schemas—are created via dbt models as optimized, query-ready tables for BI tools (Power BI, Tableau, etc.).

Dbt Cloud and Microsoft Fabric - Medaillon

Live Demo Highlights

During our webinar demonstration, we walked through:

  • Setting up a Fabric workspace and viewing lakehouse tables via SQL and the Windows Explorer integration
  • Using a Data Factory pipeline to ingest sample Snowflake data into a landing zone lakehouse
  • Authoring dbt models in dbt Cloud to create staging (hashing, load dates), hub tables, link tables, and satellites
  • Executing dbt runs that generate and run SQL in Fabric warehouses, and previewing results directly in the Fabric UI
  • Accessing all data files and Delta-Parquet tables seamlessly in Windows Explorer for multi-cloud portability

Outlook: Next-Gen Enhancements

Beyond the core implementation, here are exciting ways to evolve your Fabric-dbt platform:

  • Workspace Segmentation & Data Mesh: Create dedicated workspaces for medallion layers or business domains, and stitch multiple dbt projects together with dbt Mesh for a true data mesh design.
  • Real-Time Data Integration: Leverage Fabric’s built-in streaming capabilities to blend real-time feeds into your warehouses alongside batch data.
  • Enhanced Governance & Semantic Layers: Define and enforce semantic models both in dbt and in Fabric (via semantic models) to ensure consistent metrics across all BI tools.
  • Data Science Collaboration: Grant read-only access to bronze lakehouses and empower data scientists to use Fabric notebooks (Python, R, Scala) for ad-hoc analysis and advanced ML experiments.
  • Simplified Migration: Existing dbt projects on on-prem or other cloud warehouses can be repointed to Fabric with minimal code changes—especially when using community macros for Data Vault deployments.

Considerations & Limitations

While Fabric is powerful, be mindful of:

  • Write Support: Lakehouses currently support only SQL writes—transformations must target Fabric warehouses.
  • Shortcut Management: Cross-workspace shortcuts must be manually maintained; frequent schema changes can add overhead.
  • Multiple Overlapping Tools: Data Factory, Data Engineering pipelines, notebooks, and dbt all offer ETL—establish clear standards to avoid confusion.
  • Product Maturity: As a relatively new platform, UI changes and minor bugs may appear; plan for iterative improvements.
  • Capacity Transparency: Compute and storage share capacity; monitor and size your Fabric capacity carefully to meet SLAs.

Conclusion

Microsoft Fabric, coupled with dbt Cloud, delivers an end-to-end Enterprise Data Platform that unifies data ingestion, storage, transformation, and consumption. By applying proven patterns like the medallion architecture and Data Vault methodology, you can build a scalable, collaborative, and governed environment, empowering both data engineers and business users to unlock insights faster.

Ready to take your data platform to the next level? Reach out for a tailored workshop, architecture advisory, or hands-on implementation support.

– Tim Kirschke (Scalefree)

Modelling Salesforce History Tables in Data Vault

Modelling Salesforce History Tables

Salesforce tracks changes to configured attributes by storing them in history tables. This data, which includes record ID, field name, old and new values, and timestamps, presents a unique challenge for Data Vault modeling. In this article, we’ll explore an optimal way to model this data using Data Vault principles.



Understanding Salesforce History Tables

Salesforce allows tracking of specific attribute changes within objects like Contacts. These changes are stored in history tables such as ContactHistory. Each entry logs:

  • Record ID (e.g., Contact ID)
  • Field Name
  • Old Value
  • New Value
  • Timestamp

Challenges in Modeling Salesforce History Data

When designing a Data Vault model for this history data, there are key challenges to consider:

  • Handling multiple changes for the same record within a short time frame
  • Maintaining referential integrity
  • Efficiently querying and pivoting data for reporting

Approach: Multi-Active Satellite

A common initial approach is to model the history table as a multi-active satellite attached to a Contact Hub, with the field name as the dependent key. However, this approach has pitfalls:

  • Duplicates can arise if multiple changes occur for the same field in the same batch
  • Timestamp-based keys are unreliable due to possible duplicate timestamps

To counter this, a unique sequence number should be assigned in the staging area and used as a dependent key.

Optimized Approach: Non-Historized Link

Instead of a multi-active satellite, a non-historized link can be used to model Salesforce history data more efficiently. Here’s how it works:

  • Create a non-historized link connecting the Contact and User hubs.
  • Store change-related attributes (field name, old value, new value, timestamp) directly within this link.
  • Use the timestamp as an event-based attribute rather than part of the primary key.

This approach avoids the need for complex joins and simplifies querying.

Efficient Data Retrieval: Pivoting

Since history tables are structured in a key-value format, queries often require pivoting. By using database pivot functions, we can restructure the data into a more usable format for reporting without excessive joins.

Alternative Consideration: JSON Storage

Another approach is to store change data as a JSON object in a standard satellite. This method offers flexibility, particularly when dealing with a large number of attributes. However, it complicates querying and should be used only when necessary.

Conclusion

For most cases, a non-historized link is the optimal way to model Salesforce history tables in Data Vault. It simplifies data storage, reduces the need for extensive joins, and enhances query performance. Multi-active satellites are an alternative but require careful handling of duplicate timestamps and field changes.

Watch the Video

Meet the Speaker

Profile picture of Michael Olschimke

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

Learning from DORA: Data Governance Lessons for All Institutions

Learning from DORA: Data Governance Lessons for All Institutions

Want to improve your organization’s ability to withstand digital disruptions? This webinar unpacks the key lessons from the Digital Operational Resilience Act (DORA), providing practical takeaways you can implement immediately. Discover how DORA’s advanced approaches to data management, risk mitigation, and operational resilience can be adapted to enhance your organization’s security posture, regardless of your sector.

Register now to secure your spot and enhance your digital resilience!

Webinar Details

  • Date: January 21st, 2025
  • Time: 11:00 – 12:00 CET
Watch Webinar Recording

Speakers

Lorenz Kindling Senior Consultant

Lorenz Kindling
Senior Consultant

Lorenz is working in Business Intelligence and Enterprise Data Warehousing (EDW) with a focus on data warehouse automation and Data Vault modeling. Since 2021, he has been advising renowned companies in various industries for Scalefree International. Prior to Scalefree, he also worked as a consultant in the field of data analytics. This allowed him to gain a comprehensive overview of data warehousing projects and common issues that arise.

Implement Data Tests to Enhance Data Quality

Data Quality Testing

In today’s data-driven world, poor data quality can lead to costly mistakes. From misguided strategic decisions to operational inefficiencies and poor customer experiences, the impact of bad data is far-reaching. Issues such as duplicates, data integrity failures, missing values, and inconsistent formats can create significant business challenges.



Why Early Detection Matters

Fixing data quality issues at the source or integration level is cost-effective and minimizes business disruption. In contrast, correcting errors at the business level is expensive and can severely impact operations. Implementing data tests early ensures smooth processes and reliable reporting.

Benefits of Data Testing

  • Trust in Data – Enables confident decision-making and reliable analytics.
  • Process Efficiency – Automates quality checks and reduces manual work.
  • Business Protection – Safeguards reputation and enhances customer satisfaction.
  • Risk Reduction – Provides early warnings and ensures compliance.

Key Data Tests in Coalesce

To maintain high data quality, businesses should test for:

  • Custom business rules
  • Referential integrity
  • Value ranges
  • Uniqueness
  • Data types
  • Missing or null values

By implementing rigorous data tests, organizations can enhance data quality, minimize risks, and drive better business outcomes.

Watch the Video

The Power of Data Vault – Business Use Cases

Business Use Cases in Data Vault

In the world of data management and integration, businesses face many challenges. Data Vault is a methodology designed to address these challenges, offering a flexible and scalable solution for integrating and managing data across an enterprise. But when is it the right time to use Data Vault? Are there specific business scenarios where Data Vault’s power truly shines? This article explores the core benefits of Data Vault, its use cases, and how it can solve complex data integration problems.



Understanding the Pain Points

Before diving into when and where Data Vault is most beneficial, it’s important to understand the underlying pain points businesses face in their data management processes. According to Michael Olschimke, CEO of Scalefree, understanding the business pain points is crucial. If there is no significant problem, there may be no need for a new solution like Data Vault. The key is identifying situations where current methods fall short in handling data integration, privacy regulations, and evolving business rules.

The most common pain point is the challenge of data integration. Modern businesses typically operate with data spread across multiple sources, from internal systems to external data feeds. Integrating these data sources into a single, unified view is one of the biggest challenges. Whether you’re trying to generate reports, create dashboards, or analyze data for business insights, you need a consistent and reliable method to integrate data from diverse systems. This is where Data Vault excels.

The Core Strength of Data Vault: Data Integration

Data Vault is designed specifically for situations where integration is a priority. If a business needs to bring together multiple disparate data sources into a single framework for reporting, Data Vault offers a robust solution. Its flexibility allows businesses to combine data from different systems, apply various business logic, and present the data in a meaningful way.

In contrast to other methods, Data Vault shines when the data integration needs are complex. Simply dumping data into a data lake may seem like an easy solution, but it leaves businesses with the challenge of how to integrate these disparate datasets into a cohesive model. Without a clear method for integration, data lakes become isolated silos of information, and producing integrated reports becomes a significant challenge.

Addressing Regulatory Compliance and Privacy

In today’s data-driven world, businesses must also address regulatory requirements such as GDPR. One of the strengths of Data Vault is its built-in support for privacy and security regulations. When managing sensitive data, businesses need to ensure compliance with privacy regulations, including the ability to delete or anonymize personal data when necessary.

While other methods can also address regulatory concerns, Data Vault provides out-of-the-box patterns and solutions that are easy to implement and scale. For example, Data Vault allows businesses to securely store data, apply business rules, and remove or anonymize personal attributes without disrupting the overall data structure. This capability is crucial in today’s regulatory environment, where compliance is not just a best practice but a legal requirement.

Handling Changing Business Rules Over Time

Another key use case for Data Vault arises when businesses face changing business rules. Over time, companies evolve, and with this evolution comes changes in how data is processed and interpreted. For example, a business might need to apply different versions of a business rule to historical and current data, depending on when the rule was in effect.

Data Vault provides a solution to this challenge by separating the data transformation processes and storing them in the “business vault.” This separation allows businesses to apply different versions of business rules to different datasets. For instance, you might apply one rule to data from the previous year and a different rule to the current year’s data. This flexibility allows companies to adapt to new business requirements without overhauling their data architecture every time the rules change.

Scalability and Flexibility

As businesses grow and their data needs become more complex, the scalability of their data management solutions becomes critical. Data Vault is highly scalable because it allows companies to add new data sources, apply new business rules, and adjust their data models as needed without requiring a complete redesign of their data infrastructure.

One of the most powerful features of Data Vault is its ability to “creatively destruct” incoming data. This means that data from different source systems can be broken down into fundamental components—such as business keys, relationships, and descriptive data. These components can then be recombined in any format or structure that suits the business’s reporting or analytical needs, whether that’s a star schema, flat tables, or any other target structure. This flexibility ensures that businesses can meet various use cases and reporting requirements using the same data platform.

Data Vault and Business Intelligence

In the realm of business intelligence (BI), Data Vault stands out as an effective method for managing large, complex datasets. It offers businesses the ability to handle multiple use cases, such as generating reports, analyzing trends, and forecasting future performance. Because it integrates data from multiple sources, it provides a single, reliable source of truth for reporting and analysis.

Unlike traditional BI systems, which often require multiple data platforms or complex ETL (extract, transform, load) processes, Data Vault allows businesses to use a single platform for all their BI needs. Whether you’re running operational reports, building data marts, or creating advanced analytics models, Data Vault’s flexibility ensures that businesses can handle various BI scenarios without the need for separate systems or tools.

Addressing Complex Data Models

While Data Vault is highly flexible, it can also become more complex as businesses face increasingly complex data models. The complexity arises when businesses deal with dirty data, unclear business key definitions, or overlapping data from different source systems. In these situations, Data Vault allows companies to address these challenges by adding new components to their data models, such as hubs, links, and satellites.

For instance, if a business has two different source systems with different business key definitions, Data Vault can create a new hub to store these keys and establish relationships between them. Similarly, when data quality is an issue, Data Vault allows businesses to add computed satellites to clean the data before it’s used for reporting or analysis. While these additional components can increase the complexity of the data model, they are essential for solving the challenges presented by messy, inconsistent, or incomplete data.

When Should You Use Data Vault?

Ultimately, the decision to use Data Vault depends on your business’s data requirements. If your data integration needs are relatively simple, or if you don’t have stringent privacy or regulatory requirements, other solutions might suffice. However, for businesses dealing with complex datasets, evolving business rules, and compliance challenges, Data Vault provides a comprehensive, scalable solution that addresses all these needs.

When evaluating whether Data Vault is the right choice for your organization, it’s essential to assess your current and future data needs. If you require robust data integration, the ability to apply different business rules over time, and compliance with privacy regulations, Data Vault is a powerful tool that can handle these challenges. Its flexibility and scalability ensure that it can grow with your business as your data needs evolve.

Conclusion

Data Vault is a powerful methodology for businesses that need to integrate complex data from multiple sources, apply evolving business rules, and comply with privacy regulations. While it may not be necessary for every business, for those facing challenges in these areas, Data Vault offers a robust, flexible, and scalable solution. By breaking down and restructuring data in a way that supports various reporting and analytical needs, Data Vault ensures businesses can keep up with the ever-changing demands of today’s data-driven world.

Watch the Video

Meet the Speaker

Profile picture of Michael Olschimke

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

Bridging Domain Ownership and Data Products in Data Mesh Using Data Vault 2.0

Data Mesh and Data Vault 2.0

The Data Mesh paradigm is revolutionizing how organizations manage and utilize their data. By decentralizing data ownership and treating data as a product, businesses can create a self-sufficient ecosystem that empowers teams and promotes collaboration. Here’s how Data Mesh principles align with Data Vault 2.0 to enhance data management and governance.



Key Principles of Data Mesh

  • Domain Ownership: Data is managed at the domain level, with domains defined by business needs, such as product categories or customer segments. Analytical and operational data become the responsibility of domain teams.
  • Data as a Product: Domains own analytical data, with a focus on usability and quality. Data contracts ensure consistency and reliability for consumers.
  • Federated Governance: Standards and governance frameworks enable interoperability and ensure that the entire data ecosystem remains cohesive.
  • Self-Service Data Platform: DevOps and platform teams support a self-service environment where data can be easily shared and accessed through managed self-service BI tools.

What Defines a Domain?

A domain in Data Mesh is characterized by:

  • Autonomous Operations: Independence in managing and delivering data products.
  • Cross-Functional Teams: Teams that bring together diverse skills to manage data effectively.
  • Governance Accountability: Responsibility for adhering to governance and quality standards.

Understanding Domain Ownership

Domain ownership emphasizes:

  • Quality and Usability Focus: Delivering reliable, easy-to-use data products.
  • Decentralized Control: Allowing domain teams to manage their data independently.
  • Responsibility for Data Products: Ensuring end-to-end ownership of data assets.

What is a Data Product?

Data products embody the following principles:

  • Treating data as a product with well-defined consumers.
  • Enabling self-service usability through intuitive tools.
  • Ensuring end-to-end ownership from creation to delivery.

Integrating Data Mesh with Data Vault 2.0

Data Vault 2.0 serves as a foundation for implementing Data Mesh principles. Its focus on scalable data warehousing complements Data Mesh by supporting decentralized ownership and ensuring high-quality data products. This integration allows organizations to create a robust, scalable, and governed data ecosystem.

By combining the decentralized, domain-driven approach of Data Mesh with the structured methodology of Data Vault 2.0, businesses can unlock the full potential of their data assets.

Watch the Video

How to Tackle GDPR with Data Vault

Understanding GDPR in the Context of Data Vault

GDPR compliance is a critical concern for organizations handling personal data. Data Vault, a well-structured data modeling approach, offers a robust solution for meeting GDPR requirements, particularly in two key areas: data security and data privacy.



Data Security in Data Vault

Data security involves protecting existing data from unauthorized access. Data Vault supports this through two levels of security:

  • Row-Level Security: This ensures that users can only access records relevant to them. It can be implemented via database row-level security features or view layers.
  • Column-Level Security: Attributes are separated based on security classification. Each classification is stored in a separate Satellite, with access granted accordingly.

By controlling access at both row and column levels, organizations can ensure compliance with GDPR’s data access requirements.

Data Privacy and Deletion in Data Vault

Data privacy focuses on removing personal data when required. Data Vault’s design allows for the physical deletion of personal data without affecting the integrity of the entire dataset. This is achieved through:

  • Satellite Splitting: Personal and non-personal data are stored in separate Satellites. When a deletion request is made, only the personal data Satellite needs to be altered.
  • Data Retention Policies: Different personal data attributes may have varying retention periods. Separate Satellites are created for attributes that must be deleted at different times.
  • Point-in-Time (PIT) Table Updates: When personal data is deleted, PIT tables are rebuilt to reflect the absence of that data.

This approach ensures that deleted data is no longer accessible or retrievable, aligning with GDPR’s right to be forgotten.

Access Control Lists (ACL) in Data Vault

Managing user access to data is another essential aspect of GDPR compliance. Data Vault facilitates this through an ACL system modeled using Hubs and Links:

  • A User Hub stores information about individual users.
  • A User Group Hub categorizes users into groups with shared permissions.
  • A Customer Hub and Bank Account Hub manage customer and account details.
  • A Link connects users, user groups, and customers.
  • An Effectivity Satellite records the time periods during which users have access to specific data.

By applying this structure, access control can be managed dynamically, ensuring that only authorized users can view or modify data.

Security vs. Privacy: A Crucial Distinction

When discussing GDPR, it’s essential to distinguish between security and privacy:

  • Security: The data remains in the system, but access is restricted based on security policies.
  • Privacy: The data is physically removed from the system when no longer needed.

Organizations should ensure that security officers and privacy officers handle these concerns separately to avoid misconceptions, such as assuming filtered data is deleted when it is still present in the database.

Conclusion

Data Vault provides a comprehensive approach to managing GDPR requirements through built-in security and privacy mechanisms. By structuring data appropriately and implementing proper access control and deletion strategies, organizations can achieve GDPR compliance efficiently.

Watch the Video

Meet the Speaker

Profile picture of Michael Olschimke

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

How to Explain Data Vault to Business Users?

How to Explaining Data Vault

When introducing Data Vault to business users, it’s important to communicate its value in a way that resonates with them. Instead of focusing on the technical details, it’s best to highlight the benefits and business impact.



Understanding the Audience

Business users, including executives and commercial leaders, generally don’t need to understand the intricacies of Data Vault. They are more interested in the outcomes, such as data accessibility, security, and adaptability.

Explaining Data Vault with a Simple Analogy

Imagine you are building a house. As a homeowner, you don’t need to know every construction detail; you just want to ensure that it’s solid, safe, and has all the necessary features. Similarly, business leaders don’t need to understand the technical framework of Data Vault—they just want a reliable data management system that supports their decision-making.

Focusing on Business Value

Instead of using the term “Data Vault,” it’s often more effective to discuss the advantages of a managed data platform:

  • Data Security & Privacy: Ensures compliance with regulations like GDPR while securing sensitive information.
  • Data Integration: Consolidates structured, semi-structured, and unstructured data from multiple sources.
  • Auditability & Transparency: Provides full data lineage, ensuring that every data point can be traced back to its source.
  • Agile Data Delivery: Enables incremental delivery of insights, so business teams don’t have to wait months or years to see results.
  • Adaptability to Change: Easily adjusts to changes in source systems, business logic, and reporting needs.
  • Handling Multiple Business Timelines: Supports complex business requirements, including postdating and backdating of records.
  • Scalability: Handles large datasets and high-speed data processing.

Delivering Business Outcomes

Business leaders care about measurable outcomes. With a Data Vault-based platform, they can expect:

  • Improved decision-making with accurate, timely data.
  • Faster adaptation to market changes and customer demands.
  • Cost efficiency through a structured yet flexible data architecture.
  • Enhanced reporting and analytics with reliable data sources.

Making the Pitch to Business Users

When discussing Data Vault with executives, avoid technical jargon and focus on business goals. Instead of saying, “We use Data Vault 2.0 for data modeling,” say, “We have a data platform that ensures secure, auditable, and easily accessible insights to drive your business forward.”

By emphasizing real-world benefits, you can effectively communicate the value of Data Vault without overwhelming non-technical stakeholders with complexity.

Conclusion

Communicating the benefits of Data Vault to business users requires a shift from technical explanations to business value discussions. By framing the conversation around security, agility, and data-driven decision-making, you can successfully gain buy-in from stakeholders and demonstrate the impact of a well-managed data platform.

Watch the Video

Meet the Speaker

Profile picture of Michael Olschimke

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

Use Data Vault 2.0 to Tackle GDPR

Why use Data Vault 2.0 to Tackle GDPR?

Today, we explore how Data Vault 2.0 can be a powerful tool for addressing the challenges posed by the General Data Protection Regulation (GDPR). GDPR requires organizations to protect the personal data of European citizens and grants individuals the “right to be forgotten”. This article outlines how Data Vault 2.0 can simplify compliance with GDPR while maintaining the integrity of your data warehouse.



Understanding GDPR and its Challenges

GDPR, implemented in 2018 by the European Union, sets strict rules for handling personal data. One key aspect is the right to be forgotten, allowing individuals to request the deletion of their personal information from an organization’s systems. For data warehousing and analytics, this can be particularly challenging as organizations often need to retain some data for analytical purposes while complying with GDPR’s deletion requirements.

The Data Vault 2.0 Approach to GDPR

Data Vault 2.0 provides a structured way to tackle GDPR compliance through its unique data modeling techniques. At its core, Data Vault separates data into three main components: Hubs, Links, and Satellites. Satellites are used to store descriptive attributes of business keys, and with GDPR, we can utilize a method called Satellite Splits to manage personal and non-personal data effectively.

Satellite Splits

Satellite splits involve creating separate Satellites for personal and non-personal data. For example:

  • Personal Satellite: Contains personal information such as names, addresses, and email addresses. This data must be deleted if a customer exercises their right to be forgotten.
  • Non-Personal Satellite: Stores non-identifiable data such as regions or generated technical data, which can be retained for analytics even after personal data is removed.

When a deletion request is received, you can simply delete the records from the Personal Satellite while retaining the non-personal data for analytical use. This ensures compliance with GDPR while preserving valuable business insights.

Addressing Privacy-Relevant Business Keys

One of the challenges with GDPR is managing business keys that are tied to personal data, such as social security numbers. If such keys are used in Hubs, deleting personal data becomes complicated. Here’s how Data Vault 2.0 handles this:

Using Artificial Hubs

To avoid using personal attributes as business keys, Data Vault 2.0 introduces artificial Hubs. These Hubs assign unique, non-identifiable numbers to replace personal identifiers. For example:

  • An artificial Hub might contain a generated number for each customer’s car insurance data.
  • A Link connects the artificial Hub to the personal data stored in a Satellite.

When a customer requests deletion, you delete the connection between the personal identifier and the artificial number in the Link. The artificial Hub remains intact, allowing you to retain non-personal data for analytics without risking re-identification.

Best Practices for Implementing GDPR with Data Vault 2.0

  • Avoid Personal Identifiers as Business Keys: Always opt for non-personal or artificial identifiers wherever possible to simplify the model.
  • Use Randomized Identifiers: Generate UUIDs or random sequence numbers to prevent reverse-engineering personal data.
  • Collaborate with Legal Teams: Work closely with legal experts to define which data can be retained and which must be deleted under GDPR.

By adhering to these practices, organizations can create a robust Data Vault model that simplifies GDPR compliance while maintaining data integrity and analytics capabilities.

Conclusion

Data Vault 2.0 offers a flexible and efficient approach to tackling GDPR challenges. By leveraging Satellite splits and artificial Hubs, organizations can balance regulatory compliance with business needs. While managing GDPR compliance may seem complex at first, the structured approach of Data Vault 2.0 ensures that your data remains both secure and useful.

For further learning, join the Data Vault Innovators Community or participate in Data Vault Fridays hosted by Scalefree. These resources provide valuable insights and opportunities to explore topics like GDPR, data warehousing, and more.

Watch the Video

Meet the Speaker

Picture of Lorenz Kindling

Lorenz Kindling

Lorenz is working in Business Intelligence and Enterprise Data Warehousing (EDW) with a focus on data warehouse automation and Data Vault modeling. Since 2021, he has been advising renowned companies in various industries for Scalefree International. Prior to Scalefree, he also worked as a consultant in the field of data analytics. This allowed him to gain a comprehensive overview of data warehousing projects and common issues that arise.

Creating Data Vault Stages

Data Vault Stages

The Data Vault methodology provides a robust framework for managing and organizing enterprise data. One of the foundational components of a Data Vault is the stage. In this guide, we’ll explore what Data Vault stages are, their importance, and how to create them effectively.



Understanding Node Types in Data Vault

Before diving into stages, let’s review the key node types in a Data Vault:

  • Stages: Temporary storage areas where raw data is preprocessed.
  • Hubs: Central entities containing unique business keys.
  • Links: Relationships between hubs.
  • Satellites: Contextual and descriptive data for hubs and links.
  • PITs (Point-in-Time Tables): Optimized query performance tools.
  • Snapshot Tables: Historical states of data.
  • Non-Historized Links & Satellites: Used when historical tracking isn’t required.
  • Multi-Active Satellites: Support multiple active records for the same key.
  • Record Tracking Satellites: Track changes and versions of records.

Features of Data Vault Patterns

The Data Vault methodology leverages years of practical experience to deliver several key features:

  • Patterns Based on Expertise: Proven methods for efficient loading and processing.
  • Multi-Batch Processing: Handle multiple data batches simultaneously.
  • Automatic PIT Cleanup: Uses logarithmic snapshot logic for optimal performance.
  • Virtual Load End-Date: Allows insert-only processes by using calculated end dates.
  • Automated Ghost Records: Simplifies handling of missing or incomplete data.

Why Are Stages Important in Data Vault?

Stages play a critical role in the Data Vault architecture by enabling efficient data preparation and ensuring data integrity. Key benefits include:

  • Hash Keys & Hash Diffs: Ensures unique identifiers for data integration and deduplication.
  • Load Date & Record Source: Tracks the origin and timing of data entries.
  • Prejoins: Combines data efficiently before entering the vault.
  • Hard Rules: Implements strict validation and transformation logic.

How to Create a Data Vault Stage

Creating a stage in a Data Vault involves leveraging the right tools and techniques. For this, we recommend using Datavault4Coalesce, a powerful platform designed for Data Vault implementation. This tool simplifies the process by automating key tasks and ensuring best practices are followed.

Conclusion

Stages are a foundational component of the Data Vault methodology, enabling seamless data preparation and integration. By understanding their role and leveraging the right tools, you can ensure the success of your Data Vault implementation.

Watch the Video

Meet the Speaker

Profile picture of Tim Kirschke

Tim Kirschke
Senior Consultant

Tim has a Bachelor’s degree in Applied Mathematics and has been working as a BI consultant for Scalefree since the beginning of 2021. He’s an expert in the design and implementation of BI solutions, with focus on the Data Vault 2.0 methodology. His main areas of expertise are dbt, Coalesce, and BigQuery.

Close Menu