Building a scalable Data Platform?

Building a scalable Data Platform? In Inside Modern Data Teams

Implementing a Business Glossary: A Step-by-Step Guide

What is a Business Glossary?

A Business Glossary is a structured collection of business terms with clear definitions, ensuring consistency and accuracy across an organization. It serves as a single source of truth for terminology used in different teams and departments.

Why a Business Glossary is Essential

Standardized Terminology: Ensures that everyone uses the same definitions, reducing ambiguity.
Improved Communication: Minimizes misunderstandings between teams.
Enhanced Data Quality: Ensures consistency across reports and databases.
Supports Compliance: Helps meet regulatory requirements such as GDPR, ESG, and BCBS 239.

In this article:

Key Benefits of a Business Glossary
Challenges Without a Business Glossary
Key Components of a Business Glossary
How to Implement a Business Glossary
Key Takeaways
Conclusion
Watch the Video
Meet the Speaker

Key Benefits of a Business Glossary

Standardized terminology across teams
Faster and more accurate reporting
Easier regulatory compliance
Trustworthy, high-quality data

Challenges Without a Business Glossary

Data inconsistency across departments
Compliance risks (GDPR, ESG, BCBS 239)
Errors in reporting and decision-making
Wasted time fixing data discrepancies

Key Components of a Business Glossary

Term Name: The business term (e.g., “Customer”).
Definition: A clear, non-technical explanation.
Synonyms & Acronyms: Alternative names used across departments.
Owner: The responsible person for maintaining the term.
Business Rules: Conditions or constraints applied to the term.
Data Source: The official location of the data.

How to Implement a Business Glossary

Step 1: Identify Key Business Terms

Start by finding the most commonly used yet misunderstood terms in your organization. These are the terms that frequently cause confusion or inconsistencies.

Step 2: Define the Terms

Get cross-team agreement on definitions, document all synonyms, and resolve any conflicts in terminology.

Step 3: Store & Publish the Glossary

Make the glossary accessible to everyone in the organization. Common platforms include Excel, SharePoint, or specialized tools like Collibra.

Step 4: Assign Ownership & Governance

Assign a data owner or steward to ensure ongoing updates and accountability.

Step 5: Monitor & Improve the Glossary

Conduct quarterly reviews, track data usage trends, and integrate the glossary into reports and workflows.

Step 6: Adapt to Industry Standards

Stay updated with new regulations and industry best practices to ensure your glossary remains relevant.

Key Takeaways

A Business Glossary improves data clarity, accuracy, and trust.
Assign owners and governance roles to maintain the glossary.
Start small with 15-20 key terms before scaling.
Monitor usage and resolve conflicts regularly.
Integrating a Business Glossary into your data governance framework enhances long-term efficiency.
Start with simple tools like Excel or SharePoint, then upgrade as needed.

Conclusion

Implementing a Business Glossary is a crucial step toward achieving data consistency, improving communication, and ensuring compliance. By following a structured approach, organizations can establish a reliable glossary that grows with their business needs.

Watch the Video

Meet the Speaker

Lorenz Kindling
Senior Consultant

Lorenz is working in Business Intelligence and Enterprise Data Warehousing (EDW) with a focus on data warehouse automation and Data Vault modeling. Since 2021, he has been advising renowned companies in various industries for Scalefree International. Prior to Scalefree, he also worked as a consultant in the field of data analytics. This allowed him to gain a comprehensive overview of data warehousing projects and common issues that arise.

Building a scalable Data Platform? In Data Vault Friday

PII Business Keys: Best Practices for Artificial Hubs and Satellites

PII Business Keys

In modern data architecture, handling Personally Identifiable Information (PII) is a crucial aspect of maintaining data privacy and integrity. One common challenge in Data Vault modeling is determining how to properly load artificial hubs when using PII fields as business keys. In this article, we explore different approaches and best practices to handle this scenario effectively.

In this article:

Understanding the Loading Process for Artificial Hubs
Managing UUIDs in the ETL Process
Handling PII Deletions
Conclusion
Watch the Video
Meet the Speaker

Understanding the Loading Process for Artificial Hubs

When an artificial hub is created using a PII-based business key, a key question arises: should we load one UUID per person, or should we generate multiple UUIDs for each version or change? Additionally, how do we manage this in the ETL process?

Solution 1: Using the Technical ID as the Business Key

One approach is to integrate data using the technical ID (e.g., employee ID from a CRM system) instead of the PII-based business key. This method ensures that the actual PII data remains in the satellite, making deletions easier while maintaining integrity in data integration.

Solution 2: Storing Both Technical and Business Keys in the Hub

Another option is to load both the technical ID and the PII-based business key into the same hub. A same-as-link can be used to create a mapping between the two, allowing flexibility in identifying records while ensuring that the satellites reference the technical ID for consistency.

Solution 3: Separating Technical and Business Keys into Two Hubs

For greater flexibility, technical IDs and business keys can be stored in separate hubs with a linking mechanism. While this approach introduces additional complexity, it provides a structured way to manage mappings between different keys while keeping PII data separate.

Managing UUIDs in the ETL Process

If the source system does not provide a technical ID, an artificial UUID must be generated. This requires maintaining a lookup table in the staging layer to map each business key to a UUID. This mapping must be handled efficiently in the ETL process to ensure consistency across data loads.

Handling PII Deletions

When a delete request is received, it is essential to remove PII data while preserving relationships in the data model. Using the technical ID ensures that descriptive information remains, while direct identifiers are removed. Additionally, solutions like Delta Lake or Iceberg tables can help manage deletions effectively in a data lake environment.

Conclusion

Choosing the right approach for handling PII-based business keys depends on the specific use case and integration requirements. Using a technical ID simplifies integration but may not always be feasible. The same-as-link approach provides a balanced solution, while separate hubs offer greater flexibility at the cost of added complexity. Ultimately, a well-structured ETL process is key to ensuring data integrity and compliance with privacy regulations.

Watch the Video

Meet the Speaker

Marc Winkelmann
Managing Consultant

Marc is working in Business Intelligence and Enterprise Data Warehousing (EDW) with a focus on Data Vault 2.0 implementation and coaching. Since 2016 he is active in consulting and implementation of Data Vault 2.0 solutions with industry leaders in manufacturing, energy supply and facility management sector. In 2020 he became a Data Vault 2.0 Instructor for Scalefree.

Building a scalable Data Platform? In Data Vault Friday

Effectivity Satellites on Links

Watch the Video

Understanding Effectivity Satellites in Data Vault

Effectivity satellites play a crucial role in Data Vault modeling by tracking changes and deletions in source systems. In this article, we’ll explore when to use an effectivity satellite, how it differs from a regular satellite, and the best practices for implementing it.

What is an Effectivity Satellite?

An effectivity satellite is essentially a standard satellite used to capture business time attributes such as valid-from and valid-to dates, contract start and end dates, and deletion timestamps. The key distinction is that it tracks soft deletions from source systems, ensuring that historical data integrity is maintained in the Data Vault.

When Should You Use an Effectivity Satellite?

Effectivity satellites should be used when you need to track historical changes in relationships and entities, especially deletions. Common use cases include:

Tracking contract start and end dates
Monitoring employee and corporate car assignments
Managing customer records and their deletion status

Difference Between Regular and Effectivity Satellites

Regular satellites store descriptive attributes like names and addresses, whereas effectivity satellites focus on time-based attributes and deletion markers. While regular satellites track changes in data, effectivity satellites specifically manage record deletions and validity periods.

Choosing Between Link Satellites and Effectivity Satellites

Link satellites capture changes in relationships between entities, whereas effectivity satellites track the validity of those relationships. You should choose an effectivity satellite when:

Deletions need to be recorded without physically removing data
You need to track when a relationship was created and ended
Historical relationship integrity must be preserved

Example: Employee and Corporate Car Assignment

Consider an employee assigned to a corporate car. When the assignment changes, a new link entry is created between the employee and the new car. The effectivity satellite records the deletion timestamp for the old relationship and maintains the history of assignments.

Handling Deletions in Effectivity Satellites

One of the main challenges in effectivity satellites is detecting and handling deletions. Different data loading methods impact how deletions are recorded:

Full Loads: Compare current and previous loads to identify missing records.
Change Data Capture (CDC): Uses system flags to detect deletions.
Delta Loads: Requires additional logic to identify removed records.

Effectivity Satellites in Business Vault

In a Business Vault, effectivity satellites can be used to implement business rules for tracking deletions. For instance, a customer may be considered deleted only if removed from all source systems, which requires a business-driven deletion logic.

Conclusion

Effectivity satellites are essential in Data Vault modeling for tracking deletions and historical changes. Understanding their role and choosing the right satellite type ensures accurate data lineage and integrity.

Meet the Speaker

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

Building a scalable Data Platform? In WhereScape

Multi-Active Satellites vs. Dependent Child Links in Data Vault Modeling

Multi-Active Satellites vs. Dependent Child Links

In the realm of Data Vault modeling, practitioners often encounter scenarios where multiple descriptive entries are valid simultaneously for a single business entity. Two primary modeling techniques address this complexity: Multi-Active Satellites (MAS) and Dependent Child Links. Understanding the distinctions between these approaches is crucial for designing efficient and accurate data warehouses.

In this article:

Understanding Multi-Active Satellites
Defining Dependent Child Links
Modeling Example: Order Line Items
Modeling Example: Insurance Policies
Choosing Between Multi-Active Satellites and Dependent Child Links
Conclusion
Watch the Video
Meet the Speaker

Understanding Multi-Active Satellites

A Multi-Active Satellite is designed to store multiple instances of descriptive information related to a parent key, all valid at the same point in time. The parent can be either a Hub or a Link. This structure is particularly useful when an entity can have several concurrent attributes.

For example, consider an insurance policy that offers various coverage details, each with different validity periods. Here, the policy (Hub) is associated with multiple coverages, each represented as a row in the Multi-Active Satellite, capturing the distinct validity periods and coverage amounts.

Defining Dependent Child Links

A Dependent Child Link is a Link entity that includes one or more additional key attributes. Together with the combination of business keys connected by the Link, these attributes uniquely identify incoming data records. This structure is also known as a degenerated link, peg-legged link, non-historized link, or transactional link.

For instance, in an invoicing system, an invoice (Hub) may have multiple line items. Each line item can be uniquely identified by combining the invoice identifier with a line item number, forming a Dependent Child Link.

Modeling Example: Order Line Items

When modeling order line items, there are two valid approaches:

Create a non-historized, Dependent Child Link with a Non-historized Satellite containing the invoice footer details.
Establish a non-historized, Dependent Child Link that includes both the key combination and the invoice footer details.

The choice between these methods depends on the specific requirements of the data model and the nature of the data being captured.

Modeling Example: Insurance Policies

Consider an insurance policy with different effective time windows—a scenario discussed in a previous session. In this case, attributes such as ValidFrom, ValidTo, and Amount are descriptive data attributes related to the business relationship between the Policy and Coverage.

The recommendation is to keep these attributes together in a Multi-Active Satellite on a standard Link between Policy and Coverage. This approach ensures that all relevant information is stored cohesively, allowing for efficient querying and analysis.

Choosing Between Multi-Active Satellites and Dependent Child Links

The decision to use a Multi-Active Satellite or a Dependent Child Link hinges on the specific business scenario:

Multi-Active Satellites are ideal when multiple descriptive attributes of an entity are valid simultaneously, and these attributes need to be tracked over time. This structure allows for capturing the history of changes effectively.
Dependent Child Links are suitable when there is a need to uniquely identify records through a combination of keys, especially in transactional contexts where multiple related records exist, such as invoice line items.

It’s essential to assess the nature of the data and the business requirements to determine the most appropriate modeling technique.

Conclusion

Both Multi-Active Satellites and Dependent Child Links offer valuable structures in Data Vault modeling, each catering to different scenarios involving multiple concurrent records. By understanding their definitions, applications, and differences, data modelers can make informed decisions to design robust and efficient data warehouses.

Watch the Video

Meet the Speaker

Trung Ta
Senior Consultant

Trung has been Senior BI Consultant since 2019. As a Certified Data Vault 2.0 Practitioner at Scalefree, his area of expertise includes Data Warehousing in a cloud environment, as well as Data Vault 2.0 modeling and implementation – especially, but not limited to, with Wherescape 3D/RED. He’s been working with industry leaders in the insurance and finance sector, advising them on building their own Data Vault 2.0 solution.

Building a scalable Data Platform? In Data Vault Friday

Interview with Julien Redmond, Creator of IRiS

Interview with Julien Redmond

Welcome to another edition of Data Vault Friday! I’m Michael Olschimke, CEO of Scalefree, and every Friday at 11 o’clock, we dive into discussions about Data Vault, data mining, cloud computing, and any data-driven applications. Today, we have a special guest—Julien Redmond from Ignition Data in Australia, who’s been working with us as a partner. Julien has developed the IRiS Data Vault automation tool, and he’s here to share insights about this innovative solution.

In this article:

The Global IRiS Tour
What Sets IRiS Apart?
Seamless Integration
Empowering Data-Driven Organizations
Learning and Community Support
Future of IRiS
Watch the Video
Meet the Speaker

The Global IRiS Tour

Julien has been traveling the globe, promoting IRiS and ensuring that everyone knows about this groundbreaking tool. IRiS focuses on simplifying the data engineering aspects of Data Vault, rather than the modeling tasks, making it accessible and easy to use. Julien’s goal was to create a process so straightforward that anyone could learn it in less than a day. This simplicity allows teams to make Data Vault tasks repeatable and manageable, even for junior members.

What Sets IRiS Apart?

With so many Data Vault automation offerings available, IRiS stands out by addressing common pain points. The tool aims to minimize the steep learning curves often associated with other automation tools and facilitates seamless knowledge transfer between experienced and new users. It’s designed to integrate softly with existing data management platforms—whether that’s Microsoft Data Factory, AWS Glue, or other established tools—without disrupting current systems.

Seamless Integration

IRiS requires a minimal amount of metadata, which can be easily extracted from any modeling tool. This means there’s no new modeling interface to learn—just feed the metadata into IRiS, and it generates the necessary stored procedures and data definition scripts. This integration approach ensures that companies can leverage their existing platforms while adding powerful Data Vault automation capabilities.

Empowering Data-Driven Organizations

IRiS supports a range of target platforms like Databricks, Snowflake, and Microsoft tools, aligning with the growing trend of moving towards Lakehouse architectures. Organizations can incrementally move data into the Lakehouse based on specific use cases, promoting value-driven design and delivery. Julien emphasized that IRiS is lightweight, inexpensive, and comes as a single container—making it easy to deploy and use without significant overhead.

Learning and Community Support

One of the standout features of IRiS is its supportive learning environment. It includes an online training program with six hours of videos, a comprehensive playbook blending Data Vault methodology with practical user guidance, and access to a knowledge hub with tips and tricks. New users can get up to speed quickly, reinforcing their learning with a supportive community ready to help when needed.

Future of IRiS

Julien’s global tour reflects the excitement and confidence behind IRiS. As he visits partners worldwide—from Finland to the US—he’s spreading the word about how IRiS can transform Data Vault engineering, especially for organizations invested in cloud platforms. The response so far has been overwhelmingly positive, with teams appreciating how IRiS fits into their existing infrastructures while simplifying their workflows.

That wraps up this special session of Data Vault Friday! Thanks for joining us, and a big thanks to Julien for sharing his journey with IRiS. We’ll return to our usual Q&A format next time, so be sure to bring your questions. Until then, have a fantastic weekend!

Watch the Video

Meet the Speaker

Michael Olschimke

Building a scalable Data Platform? In Beginner, Data Tools,

Microsoft Fabric as an Enterprise Data Platform

Introduction to Microsoft Fabric and dbt Cloud

In today’s digital world, organizations need a unified, scalable, and collaborative data platform to power analytics, AI-driven insights, and business intelligence. Enter Microsoft Fabric—a comprehensive, role-based, SaaS-delivered data platform that brings together key Azure services under a single “one lake” foundation with built-in AI capabilities.

In this article, we’ll explore how Microsoft Fabric can serve as your enterprise data platform, how it integrates with data modeling tools like dbt Cloud, and a proven “medallion” reference architecture that takes you from raw data ingestion to business-ready information marts. We’ll also discuss future extensions, practical limitations, and best practices to guide your journey.

Microsoft Fabric as an Enterprise Data Platform

This webinar covers leveraging Microsoft Fabric to implement a modern, end-to-end data platform. You will learn, how the different Fabric services can be combined, to implement a medaillon architecture, supported by Data Vault 2.0 and dbt Cloud. A live demo will show lakehouses, warehouses, and Hub, Links, and Satellites in a real world scenario!

Watch webinar recording

In this article:

Quick Primer: The Data Vault Methodology
Microsoft Fabric: Core Front-Ends and Services
Integrating dbt Cloud with Microsoft Fabric
Reference Architecture: The Medallion Approach on Fabric
Live Demo Highlights
Outlook: Next-Gen Enhancements
Considerations & Limitations
Conclusion

Quick Primer: The Data Vault Methodology

Before diving into Fabric, it’s helpful to understand the Data Vault approach—an architecture pattern that brings agility, auditability, and scalability to your data warehouse. It comprises three core components:

Business Keys: Unique identifiers of business objects (e.g., customer number in a CRM).
Descriptive Data: Attributes that describe business keys (e.g., customer name, birthdate), which evolve over time.
Relationships: Linkages between business keys (e.g., customer–order relationships in a CRM).

By separating these elements into hubs, satellites, and links, Data Vault provides a repeatable, auditable framework for loading and tracing data lineage, perfectly suited for modern cloud platforms.

Microsoft Fabric: Core Front-Ends and Services

At its heart, Microsoft Fabric brings together seven role-based “front-end” experiences, but three of them are key to enterprise data engineering and warehousing:

Data Factory

Data Flows: Low-code transformations (joins, aggregations, cleansing) via a Power Query-like interface.
Data Pipelines: Petabyte-scale ETL/ELT workflows with full control-flow constructs (if/else, loops).

Use case: Ingest raw data from relational, semi-structured, or unstructured sources into your landing zone lakehouse.

Data Engineering

Lakehouses: Unified storage for structured/unstructured data in Delta-Parquet format, with SQL endpoints for analytics.
Notebooks: Interactive Python, R, or Scala environments for data prep, analysis, and data science exploration.
Spark Job Definitions: Batch and streaming ETL jobs on Spark clusters.
Data Pipelines: Orchestrated sequences of collection, processing, and transformation steps.

Use case: Land raw data and expose it to data scientists or further transformation processes.

Data Warehouse

Warehouses: Relational-style databases with Delta-Parquet storage, instant elastic scale, and full transactional support.
Support for cross-warehouse queries and seamless read access to lakehouses.

Use case: Implement Data Vault’s Raw Vault, Business Vault, and Information Marts for BI consumption.

Workspaces

All Fabric resources live inside workspaces, which group lakehouses, warehouses, notebooks, pipelines, and more. Workspaces enable:z

Role-based access control and collaboration
Integration with Git for versioning and CI/CD
Cross-workspace data access via shortcuts

Integrating dbt Cloud with Microsoft Fabric

dbt Cloud is an industry-leading transformation framework that brings software engineering best practices to your data models: modular SQL, testing, documentation, and CI/CD. In Fabric, dbt Cloud:

Connects to a Fabric workspace as a data warehouse endpoint
Generates SQL models (SELECT statements), reading from lakehouses or warehouses
Executes those models natively on Fabric warehouses

Key benefit: dbt manages your Data Vault layers (hubs, links, satellites, and information marts) with clear lineage, testing, and version control—while Fabric handles execution, storage, and compute elasticity.

Reference Architecture: The Medallion Approach on Fabric

The modern “medallion” architecture separates data into three refinement layers—Bronze (raw), Silver (conformed/business), and Gold (BI-ready). Here’s how it maps onto Fabric:

Bronze (Landing Zone Lakehouse)

Data Factory pipelines copy raw relational, JSON, and unstructured files into a lakehouse. This fully persisted, immutable history remains read-only for most users.

Silver (Raw & Business Vault Warehouses)

Raw Vault Warehouse: dbt models generate staging views/tables with hash keys, load dates, and audit metadata.
Business Vault Warehouse: dbt builds hubs, links, and satellites based on business keys and relationships.

Gold (Information Mart Warehouse)

Information marts—star or snowflake schemas—are created via dbt models as optimized, query-ready tables for BI tools (Power BI, Tableau, etc.).

Dbt Cloud and Microsoft Fabric - Medaillon

Live Demo Highlights

During our webinar demonstration, we walked through:

Setting up a Fabric workspace and viewing lakehouse tables via SQL and the Windows Explorer integration
Using a Data Factory pipeline to ingest sample Snowflake data into a landing zone lakehouse
Authoring dbt models in dbt Cloud to create staging (hashing, load dates), hub tables, link tables, and satellites
Executing dbt runs that generate and run SQL in Fabric warehouses, and previewing results directly in the Fabric UI
Accessing all data files and Delta-Parquet tables seamlessly in Windows Explorer for multi-cloud portability

Outlook: Next-Gen Enhancements

Beyond the core implementation, here are exciting ways to evolve your Fabric-dbt platform:

Workspace Segmentation & Data Mesh: Create dedicated workspaces for medallion layers or business domains, and stitch multiple dbt projects together with dbt Mesh for a true data mesh design.
Real-Time Data Integration: Leverage Fabric’s built-in streaming capabilities to blend real-time feeds into your warehouses alongside batch data.
Enhanced Governance & Semantic Layers: Define and enforce semantic models both in dbt and in Fabric (via semantic models) to ensure consistent metrics across all BI tools.
Data Science Collaboration: Grant read-only access to bronze lakehouses and empower data scientists to use Fabric notebooks (Python, R, Scala) for ad-hoc analysis and advanced ML experiments.
Simplified Migration: Existing dbt projects on on-prem or other cloud warehouses can be repointed to Fabric with minimal code changes—especially when using community macros for Data Vault deployments.

Considerations & Limitations

While Fabric is powerful, be mindful of:

Write Support: Lakehouses currently support only SQL writes—transformations must target Fabric warehouses.
Shortcut Management: Cross-workspace shortcuts must be manually maintained; frequent schema changes can add overhead.
Multiple Overlapping Tools: Data Factory, Data Engineering pipelines, notebooks, and dbt all offer ETL—establish clear standards to avoid confusion.
Product Maturity: As a relatively new platform, UI changes and minor bugs may appear; plan for iterative improvements.
Capacity Transparency: Compute and storage share capacity; monitor and size your Fabric capacity carefully to meet SLAs.

Conclusion

Microsoft Fabric, coupled with dbt Cloud, delivers an end-to-end Enterprise Data Platform that unifies data ingestion, storage, transformation, and consumption. By applying proven patterns like the medallion architecture and Data Vault methodology, you can build a scalable, collaborative, and governed environment, empowering both data engineers and business users to unlock insights faster.

Ready to take your data platform to the next level? Reach out for a tailored workshop, architecture advisory, or hands-on implementation support.

– Tim Kirschke (Scalefree)

Building a scalable Data Platform? In Data Vault Friday

Modelling Salesforce History Tables in Data Vault

Modelling Salesforce History Tables

Salesforce tracks changes to configured attributes by storing them in history tables. This data, which includes record ID, field name, old and new values, and timestamps, presents a unique challenge for Data Vault modeling. In this article, we’ll explore an optimal way to model this data using Data Vault principles.

In this article:

Understanding Salesforce History Tables
Challenges in Modeling Salesforce History Data
Approach: Multi-Active Satellite
Optimized Approach: Non-Historized Link
Efficient Data Retrieval: Pivoting
Alternative Consideration: JSON Storage
Conclusion
Watch the Video
Meet the Speaker

Understanding Salesforce History Tables

Salesforce allows tracking of specific attribute changes within objects like Contacts. These changes are stored in history tables such as ContactHistory. Each entry logs:

Record ID (e.g., Contact ID)
Field Name
Old Value
New Value
Timestamp

Challenges in Modeling Salesforce History Data

When designing a Data Vault model for this history data, there are key challenges to consider:

Handling multiple changes for the same record within a short time frame
Maintaining referential integrity
Efficiently querying and pivoting data for reporting

Approach: Multi-Active Satellite

A common initial approach is to model the history table as a multi-active satellite attached to a Contact Hub, with the field name as the dependent key. However, this approach has pitfalls:

Duplicates can arise if multiple changes occur for the same field in the same batch
Timestamp-based keys are unreliable due to possible duplicate timestamps

To counter this, a unique sequence number should be assigned in the staging area and used as a dependent key.

Optimized Approach: Non-Historized Link

Instead of a multi-active satellite, a non-historized link can be used to model Salesforce history data more efficiently. Here’s how it works:

Create a non-historized link connecting the Contact and User hubs.
Store change-related attributes (field name, old value, new value, timestamp) directly within this link.
Use the timestamp as an event-based attribute rather than part of the primary key.

This approach avoids the need for complex joins and simplifies querying.

Efficient Data Retrieval: Pivoting

Since history tables are structured in a key-value format, queries often require pivoting. By using database pivot functions, we can restructure the data into a more usable format for reporting without excessive joins.

Alternative Consideration: JSON Storage

Another approach is to store change data as a JSON object in a standard satellite. This method offers flexibility, particularly when dealing with a large number of attributes. However, it complicates querying and should be used only when necessary.

Conclusion

For most cases, a non-historized link is the optimal way to model Salesforce history tables in Data Vault. It simplifies data storage, reduces the need for extensive joins, and enhances query performance. Multi-active satellites are an alternative but require careful handling of duplicate timestamps and field changes.

Watch the Video

Meet the Speaker

Michael Olschimke

Building a scalable Data Platform? In Beginner

Learning from DORA: Data Governance Lessons for All Institutions

Want to improve your organization’s ability to withstand digital disruptions? This webinar unpacks the key lessons from the Digital Operational Resilience Act (DORA), providing practical takeaways you can implement immediately. Discover how DORA’s advanced approaches to data management, risk mitigation, and operational resilience can be adapted to enhance your organization’s security posture, regardless of your sector.

Webinar Details

Date: January 21st, 2025
Time: 11:00 – 12:00 CET

Watch Webinar Recording

Speakers

Lorenz Kindling
Senior Consultant

Building a scalable Data Platform? In Data Vault Friday

The Power of Data Vault – Business Use Cases

Business Use Cases in Data Vault

In the world of data management and integration, businesses face many challenges. Data Vault is a methodology designed to address these challenges, offering a flexible and scalable solution for integrating and managing data across an enterprise. But when is it the right time to use Data Vault? Are there specific business scenarios where Data Vault’s power truly shines? This article explores the core benefits of Data Vault, its use cases, and how it can solve complex data integration problems.

In this article:

Understanding the Pain Points
The Core Strength of Data Vault: Data Integration
Addressing Regulatory Compliance and Privacy
Handling Changing Business Rules Over Time
Scalability and Flexibility
Data Vault and Business Intelligence
Addressing Complex Data Models
When Should You Use Data Vault?
Conclusion
Watch the Video
Meet the Speaker

Understanding the Pain Points

Before diving into when and where Data Vault is most beneficial, it’s important to understand the underlying pain points businesses face in their data management processes. According to Michael Olschimke, CEO of Scalefree, understanding the business pain points is crucial. If there is no significant problem, there may be no need for a new solution like Data Vault. The key is identifying situations where current methods fall short in handling data integration, privacy regulations, and evolving business rules.

The most common pain point is the challenge of data integration. Modern businesses typically operate with data spread across multiple sources, from internal systems to external data feeds. Integrating these data sources into a single, unified view is one of the biggest challenges. Whether you’re trying to generate reports, create dashboards, or analyze data for business insights, you need a consistent and reliable method to integrate data from diverse systems. This is where Data Vault excels.

The Core Strength of Data Vault: Data Integration

Data Vault is designed specifically for situations where integration is a priority. If a business needs to bring together multiple disparate data sources into a single framework for reporting, Data Vault offers a robust solution. Its flexibility allows businesses to combine data from different systems, apply various business logic, and present the data in a meaningful way.

In contrast to other methods, Data Vault shines when the data integration needs are complex. Simply dumping data into a data lake may seem like an easy solution, but it leaves businesses with the challenge of how to integrate these disparate datasets into a cohesive model. Without a clear method for integration, data lakes become isolated silos of information, and producing integrated reports becomes a significant challenge.

Addressing Regulatory Compliance and Privacy

In today’s data-driven world, businesses must also address regulatory requirements such as GDPR. One of the strengths of Data Vault is its built-in support for privacy and security regulations. When managing sensitive data, businesses need to ensure compliance with privacy regulations, including the ability to delete or anonymize personal data when necessary.

While other methods can also address regulatory concerns, Data Vault provides out-of-the-box patterns and solutions that are easy to implement and scale. For example, Data Vault allows businesses to securely store data, apply business rules, and remove or anonymize personal attributes without disrupting the overall data structure. This capability is crucial in today’s regulatory environment, where compliance is not just a best practice but a legal requirement.

Handling Changing Business Rules Over Time

Another key use case for Data Vault arises when businesses face changing business rules. Over time, companies evolve, and with this evolution comes changes in how data is processed and interpreted. For example, a business might need to apply different versions of a business rule to historical and current data, depending on when the rule was in effect.

Data Vault provides a solution to this challenge by separating the data transformation processes and storing them in the “business vault.” This separation allows businesses to apply different versions of business rules to different datasets. For instance, you might apply one rule to data from the previous year and a different rule to the current year’s data. This flexibility allows companies to adapt to new business requirements without overhauling their data architecture every time the rules change.

Scalability and Flexibility

As businesses grow and their data needs become more complex, the scalability of their data management solutions becomes critical. Data Vault is highly scalable because it allows companies to add new data sources, apply new business rules, and adjust their data models as needed without requiring a complete redesign of their data infrastructure.

One of the most powerful features of Data Vault is its ability to “creatively destruct” incoming data. This means that data from different source systems can be broken down into fundamental components—such as business keys, relationships, and descriptive data. These components can then be recombined in any format or structure that suits the business’s reporting or analytical needs, whether that’s a star schema, flat tables, or any other target structure. This flexibility ensures that businesses can meet various use cases and reporting requirements using the same data platform.

Data Vault and Business Intelligence

In the realm of business intelligence (BI), Data Vault stands out as an effective method for managing large, complex datasets. It offers businesses the ability to handle multiple use cases, such as generating reports, analyzing trends, and forecasting future performance. Because it integrates data from multiple sources, it provides a single, reliable source of truth for reporting and analysis.

Unlike traditional BI systems, which often require multiple data platforms or complex ETL (extract, transform, load) processes, Data Vault allows businesses to use a single platform for all their BI needs. Whether you’re running operational reports, building data marts, or creating advanced analytics models, Data Vault’s flexibility ensures that businesses can handle various BI scenarios without the need for separate systems or tools.

Addressing Complex Data Models

While Data Vault is highly flexible, it can also become more complex as businesses face increasingly complex data models. The complexity arises when businesses deal with dirty data, unclear business key definitions, or overlapping data from different source systems. In these situations, Data Vault allows companies to address these challenges by adding new components to their data models, such as hubs, links, and satellites.

For instance, if a business has two different source systems with different business key definitions, Data Vault can create a new hub to store these keys and establish relationships between them. Similarly, when data quality is an issue, Data Vault allows businesses to add computed satellites to clean the data before it’s used for reporting or analysis. While these additional components can increase the complexity of the data model, they are essential for solving the challenges presented by messy, inconsistent, or incomplete data.

When Should You Use Data Vault?

Ultimately, the decision to use Data Vault depends on your business’s data requirements. If your data integration needs are relatively simple, or if you don’t have stringent privacy or regulatory requirements, other solutions might suffice. However, for businesses dealing with complex datasets, evolving business rules, and compliance challenges, Data Vault provides a comprehensive, scalable solution that addresses all these needs.

When evaluating whether Data Vault is the right choice for your organization, it’s essential to assess your current and future data needs. If you require robust data integration, the ability to apply different business rules over time, and compliance with privacy regulations, Data Vault is a powerful tool that can handle these challenges. Its flexibility and scalability ensure that it can grow with your business as your data needs evolve.

Conclusion

Data Vault is a powerful methodology for businesses that need to integrate complex data from multiple sources, apply evolving business rules, and comply with privacy regulations. While it may not be necessary for every business, for those facing challenges in these areas, Data Vault offers a robust, flexible, and scalable solution. By breaking down and restructuring data in a way that supports various reporting and analytical needs, Data Vault ensures businesses can keep up with the ever-changing demands of today’s data-driven world.

Watch the Video

Meet the Speaker

Michael Olschimke

Building a scalable Data Platform? In Data Vault Friday

How to Tackle GDPR with Data Vault

Understanding GDPR in the Context of Data Vault

GDPR compliance is a critical concern for organizations handling personal data. Data Vault, a well-structured data modeling approach, offers a robust solution for meeting GDPR requirements, particularly in two key areas: data security and data privacy.

In this article:

Data Security in Data Vault
Data Privacy and Deletion in Data Vault
Access Control Lists (ACL) in Data Vault
Security vs. Privacy: A Crucial Distinction
Conclusion
Watch the Video
Meet the Speaker

Data Security in Data Vault

Data security involves protecting existing data from unauthorized access. Data Vault supports this through two levels of security:

Row-Level Security: This ensures that users can only access records relevant to them. It can be implemented via database row-level security features or view layers.
Column-Level Security: Attributes are separated based on security classification. Each classification is stored in a separate Satellite, with access granted accordingly.

By controlling access at both row and column levels, organizations can ensure compliance with GDPR’s data access requirements.

Data Privacy and Deletion in Data Vault

Data privacy focuses on removing personal data when required. Data Vault’s design allows for the physical deletion of personal data without affecting the integrity of the entire dataset. This is achieved through:

Satellite Splitting: Personal and non-personal data are stored in separate Satellites. When a deletion request is made, only the personal data Satellite needs to be altered.
Data Retention Policies: Different personal data attributes may have varying retention periods. Separate Satellites are created for attributes that must be deleted at different times.
Point-in-Time (PIT) Table Updates: When personal data is deleted, PIT tables are rebuilt to reflect the absence of that data.

This approach ensures that deleted data is no longer accessible or retrievable, aligning with GDPR’s right to be forgotten.

Access Control Lists (ACL) in Data Vault

Managing user access to data is another essential aspect of GDPR compliance. Data Vault facilitates this through an ACL system modeled using Hubs and Links:

A User Hub stores information about individual users.
A User Group Hub categorizes users into groups with shared permissions.
A Customer Hub and Bank Account Hub manage customer and account details.
A Link connects users, user groups, and customers.
An Effectivity Satellite records the time periods during which users have access to specific data.

By applying this structure, access control can be managed dynamically, ensuring that only authorized users can view or modify data.

Security vs. Privacy: A Crucial Distinction

When discussing GDPR, it’s essential to distinguish between security and privacy:

Security: The data remains in the system, but access is restricted based on security policies.
Privacy: The data is physically removed from the system when no longer needed.

Organizations should ensure that security officers and privacy officers handle these concerns separately to avoid misconceptions, such as assuming filtered data is deleted when it is still present in the database.

Conclusion

Data Vault provides a comprehensive approach to managing GDPR requirements through built-in security and privacy mechanisms. By structuring data appropriately and implementing proper access control and deletion strategies, organizations can achieve GDPR compliance efficiently.

Watch the Video

Meet the Speaker

Michael Olschimke

Building a scalable Data Platform? In Data Vault Friday

How to Explain Data Vault to Business Users?

How to Explaining Data Vault

When introducing Data Vault to business users, it’s important to communicate its value in a way that resonates with them. Instead of focusing on the technical details, it’s best to highlight the benefits and business impact.

In this article:

Understanding the Audience
Explaining Data Vault with a Simple Analogy
Focusing on Business Value
Delivering Business Outcomes
Making the Pitch to Business Users
Conclusion
Watch the Video
Meet the Speaker

Understanding the Audience

Business users, including executives and commercial leaders, generally don’t need to understand the intricacies of Data Vault. They are more interested in the outcomes, such as data accessibility, security, and adaptability.

Explaining Data Vault with a Simple Analogy

Imagine you are building a house. As a homeowner, you don’t need to know every construction detail; you just want to ensure that it’s solid, safe, and has all the necessary features. Similarly, business leaders don’t need to understand the technical framework of Data Vault—they just want a reliable data management system that supports their decision-making.

Focusing on Business Value

Instead of using the term “Data Vault,” it’s often more effective to discuss the advantages of a managed data platform:

Data Security & Privacy: Ensures compliance with regulations like GDPR while securing sensitive information.
Data Integration: Consolidates structured, semi-structured, and unstructured data from multiple sources.
Auditability & Transparency: Provides full data lineage, ensuring that every data point can be traced back to its source.
Agile Data Delivery: Enables incremental delivery of insights, so business teams don’t have to wait months or years to see results.
Adaptability to Change: Easily adjusts to changes in source systems, business logic, and reporting needs.
Handling Multiple Business Timelines: Supports complex business requirements, including postdating and backdating of records.
Scalability: Handles large datasets and high-speed data processing.

Delivering Business Outcomes

Business leaders care about measurable outcomes. With a Data Vault-based platform, they can expect:

Improved decision-making with accurate, timely data.
Faster adaptation to market changes and customer demands.
Cost efficiency through a structured yet flexible data architecture.
Enhanced reporting and analytics with reliable data sources.

Making the Pitch to Business Users

When discussing Data Vault with executives, avoid technical jargon and focus on business goals. Instead of saying, “We use Data Vault 2.0 for data modeling,” say, “We have a data platform that ensures secure, auditable, and easily accessible insights to drive your business forward.”

By emphasizing real-world benefits, you can effectively communicate the value of Data Vault without overwhelming non-technical stakeholders with complexity.

Conclusion

Communicating the benefits of Data Vault to business users requires a shift from technical explanations to business value discussions. By framing the conversation around security, agility, and data-driven decision-making, you can successfully gain buy-in from stakeholders and demonstrate the impact of a well-managed data platform.

Watch the Video

Meet the Speaker

Michael Olschimke

Building a scalable Data Platform? In Data Vault Friday

Use Data Vault 2.0 to Tackle GDPR

Why use Data Vault 2.0 to Tackle GDPR?

Today, we explore how Data Vault 2.0 can be a powerful tool for addressing the challenges posed by the General Data Protection Regulation (GDPR). GDPR requires organizations to protect the personal data of European citizens and grants individuals the “right to be forgotten”. This article outlines how Data Vault 2.0 can simplify compliance with GDPR while maintaining the integrity of your data warehouse.

In this article:

Understanding GDPR and its Challenges
The Data Vault 2.0 Approach to GDPR
- Satellite Splits
Addressing Privacy-Relevant Business Keys
- Using Artificial Hubs
Best Practices for Implementing GDPR with Data Vault 2.0
Conclusion
Watch the Video
Meet the Speaker

Understanding GDPR and its Challenges

GDPR, implemented in 2018 by the European Union, sets strict rules for handling personal data. One key aspect is the right to be forgotten, allowing individuals to request the deletion of their personal information from an organization’s systems. For data warehousing and analytics, this can be particularly challenging as organizations often need to retain some data for analytical purposes while complying with GDPR’s deletion requirements.

The Data Vault 2.0 Approach to GDPR

Data Vault 2.0 provides a structured way to tackle GDPR compliance through its unique data modeling techniques. At its core, Data Vault separates data into three main components: Hubs, Links, and Satellites. Satellites are used to store descriptive attributes of business keys, and with GDPR, we can utilize a method called Satellite Splits to manage personal and non-personal data effectively.

Satellite Splits

Satellite splits involve creating separate Satellites for personal and non-personal data. For example:

Personal Satellite: Contains personal information such as names, addresses, and email addresses. This data must be deleted if a customer exercises their right to be forgotten.
Non-Personal Satellite: Stores non-identifiable data such as regions or generated technical data, which can be retained for analytics even after personal data is removed.

When a deletion request is received, you can simply delete the records from the Personal Satellite while retaining the non-personal data for analytical use. This ensures compliance with GDPR while preserving valuable business insights.

Addressing Privacy-Relevant Business Keys

One of the challenges with GDPR is managing business keys that are tied to personal data, such as social security numbers. If such keys are used in Hubs, deleting personal data becomes complicated. Here’s how Data Vault 2.0 handles this:

Using Artificial Hubs

To avoid using personal attributes as business keys, Data Vault 2.0 introduces artificial Hubs. These Hubs assign unique, non-identifiable numbers to replace personal identifiers. For example:

An artificial Hub might contain a generated number for each customer’s car insurance data.
A Link connects the artificial Hub to the personal data stored in a Satellite.

When a customer requests deletion, you delete the connection between the personal identifier and the artificial number in the Link. The artificial Hub remains intact, allowing you to retain non-personal data for analytics without risking re-identification.

Best Practices for Implementing GDPR with Data Vault 2.0

Avoid Personal Identifiers as Business Keys: Always opt for non-personal or artificial identifiers wherever possible to simplify the model.
Use Randomized Identifiers: Generate UUIDs or random sequence numbers to prevent reverse-engineering personal data.
Collaborate with Legal Teams: Work closely with legal experts to define which data can be retained and which must be deleted under GDPR.

By adhering to these practices, organizations can create a robust Data Vault model that simplifies GDPR compliance while maintaining data integrity and analytics capabilities.

Conclusion

Data Vault 2.0 offers a flexible and efficient approach to tackling GDPR challenges. By leveraging Satellite splits and artificial Hubs, organizations can balance regulatory compliance with business needs. While managing GDPR compliance may seem complex at first, the structured approach of Data Vault 2.0 ensures that your data remains both secure and useful.

For further learning, join the Data Vault Innovators Community or participate in Data Vault Fridays hosted by Scalefree. These resources provide valuable insights and opportunities to explore topics like GDPR, data warehousing, and more.

Watch the Video

Meet the Speaker

Lorenz Kindling

What is a Business Glossary?

Why a Business Glossary is Essential

Key Benefits of a Business Glossary

Challenges Without a Business Glossary

Key Components of a Business Glossary

How to Implement a Business Glossary

Step 1: Identify Key Business Terms

Step 2: Define the Terms

Step 3: Store & Publish the Glossary

Step 4: Assign Ownership & Governance

Step 5: Monitor & Improve the Glossary

Step 6: Adapt to Industry Standards

Key Takeaways

Conclusion

Watch the Video

Meet the Speaker

PII Business Keys

Understanding the Loading Process for Artificial Hubs

Solution 1: Using the Technical ID as the Business Key

Solution 2: Storing Both Technical and Business Keys in the Hub

Solution 3: Separating Technical and Business Keys into Two Hubs

Managing UUIDs in the ETL Process

Handling PII Deletions

Conclusion

Watch the Video

Meet the Speaker

Watch the Video

Understanding Effectivity Satellites in Data Vault

What is an Effectivity Satellite?

When Should You Use an Effectivity Satellite?

Difference Between Regular and Effectivity Satellites

Choosing Between Link Satellites and Effectivity Satellites

Example: Employee and Corporate Car Assignment

Handling Deletions in Effectivity Satellites

Effectivity Satellites in Business Vault

Conclusion

Meet the Speaker

Multi-Active Satellites vs. Dependent Child Links

Understanding Multi-Active Satellites

Defining Dependent Child Links

Modeling Example: Order Line Items

Modeling Example: Insurance Policies

Choosing Between Multi-Active Satellites and Dependent Child Links

Conclusion

Watch the Video

Meet the Speaker

Interview with Julien Redmond

The Global IRiS Tour

What Sets IRiS Apart?

Seamless Integration

Empowering Data-Driven Organizations

Learning and Community Support

Future of IRiS

Watch the Video

Meet the Speaker

Introduction to Microsoft Fabric and dbt Cloud

Microsoft Fabric as an Enterprise Data Platform

Quick Primer: The Data Vault Methodology

Microsoft Fabric: Core Front-Ends and Services

Data Factory

Data Engineering

Data Warehouse

Workspaces

Integrating dbt Cloud with Microsoft Fabric

Reference Architecture: The Medallion Approach on Fabric

Bronze (Landing Zone Lakehouse)

Silver (Raw &amp; Business Vault Warehouses)

Gold (Information Mart Warehouse)

Live Demo Highlights

Outlook: Next-Gen Enhancements

Considerations & Limitations

Conclusion

Modelling Salesforce History Tables

Understanding Salesforce History Tables

Challenges in Modeling Salesforce History Data

Approach: Multi-Active Satellite

Optimized Approach: Non-Historized Link

Efficient Data Retrieval: Pivoting

Alternative Consideration: JSON Storage

Silver (Raw & Business Vault Warehouses)