Skip to main content
search
0
Scalefree Knowledge Webinars Data Vault Friday Soft-Deleting Records in Data Vault: A Real-World Approach to Status Tracking Satellites

Soft-Deleting Records in Data Vault

When working with Data Vault in real-world enterprise data warehousing projects, managing soft-deleted records is more than just a theoretical exercise—it’s a necessity. While many books and training examples offer simplified scenarios, real-world implementations must take into account complex requirements such as status tracking from multiple source systems. In this article, we dive into how to virtualize dimensions in a Data Vault model with proper handling of status tracking satellites, ensuring that soft deletions are effectively managed and the integrity of historical records is preserved.



The Problem with Simplified Examples

Most Data Vault tutorials focus on core concepts: Hubs, Links, and standard Satellites. These foundational examples are useful for learning but fall short when we need to address data that has been logically removed or soft-deleted. In real-world systems, this is a common occurrence, and ignoring it risks producing inaccurate analytics and flawed dimension models.

The key challenge is to determine whether a business entity—like a customer, product, or concept—is still “active” in the eyes of the business. This gets even trickier when the data comes from multiple source systems, each with its own deletion logic. That’s where Status Tracking Satellites (also known as Effectivity Satellites) come in.

Scenario Overview: Multi-Source Concept with Soft Deletes

Consider a scenario where a Concept (e.g., “Customer”) is fed by two different source systems. Here’s a breakdown of the Data Vault objects involved:

  • Concept_PIT – A Point-In-Time (PIT) table indexing data across all satellites.
  • Concept_SAT_S_source1 – A standard satellite with descriptive data from Source 1.
  • Concept_SATST_source1 – A status tracking satellite for Source 1.
  • Concept_SAT_S_source2 – A standard satellite from Source 2 with more descriptive attributes.
  • Concept_SATST_source2 – A status tracking satellite for Source 2.

In this setup, each source system tracks its own deletions, independently of the other. That means a Concept could be deleted in one source but still be active in another. Properly modeling and querying this requires careful integration of all status indicators.

What Do Status Tracking Satellites Actually Do?

Contrary to some misunderstandings, status tracking satellites are not used to track the deletion of descriptive attributes. That’s the role of standard satellites. Instead, they track whether the entire business key has been logically deleted in the source system. For example, if a customer row is completely removed from a source table, the status tracking satellite records that deletion event.

This distinction is important: you might null out a field in the source system without deleting the record. The standard satellite captures that. But if the whole customer row is deleted, only the status tracking satellite will catch it.

How to Use the PIT Table

The PIT table is key to virtualizing dimensions. It acts as a bridge, linking together the various satellites—both descriptive and status tracking—by capturing the effective row per business key and timestamp. Your virtual dimension view selects from the PIT table and joins all relevant satellites using hash keys and load dates.

In this case, you treat the status tracking satellites just like standard satellites. You join them on hash keys and load dates using PIT indexes. This lets you bring in any flags, such as IsActive or deletion timestamps, as part of your view logic.

Business Rules for Determining Active Status

One of the biggest questions in designing this architecture is: How do we determine if a business key is still active?

There are a few approaches:

  • Single-source dominance: If the customer is deleted in the primary (golden) source, the business key is considered deleted.
  • All-source consensus: The business key is only considered deleted if it’s removed from all source systems.

Each approach has its pros and cons. The decision should be based on your business rules and requirements. You can use simple boolean flags like IsActive or more advanced logic combining multiple status indicators from different sources.

Should You Remove Deleted Entities from Dimensions?

This is a hot topic. Some organizations want to remove soft-deleted entities from their dimension tables entirely. While that may sound clean, it can create problems downstream—especially in fact tables where foreign key references still exist for those deleted entities.

The recommended approach? Flag them instead of deleting them. This preserves history and maintains referential integrity. It also helps analysts understand that, yes, the product or customer was deleted—but it still contributed to revenue or other KPIs in the past.

Virtualizing the Dimension View

Your final dimension view should:

  • Select from the PIT table for the appropriate concept.
  • Join to all descriptive satellites using the hash key and PIT load dates.
  • Join to all status tracking satellites similarly, treating them like descriptive sources.
  • Derive an IsActive or IsDeleted flag using business logic.

Here’s a simplified example of what that SQL might look like:

SELECT 
  pit.BusinessKey,
  sat1.Description1,
  sat2.Description2,
  CASE 
    WHEN st1.IsActive = 1 AND st2.IsActive = 1 THEN 1
    ELSE 0
  END AS IsActive
FROM Concept_PIT pit
LEFT JOIN Concept_SAT_S_source1 sat1 
  ON pit.HashKey = sat1.HashKey AND pit.LoadDate1 = sat1.LoadDate
LEFT JOIN Concept_SAT_S_source2 sat2 
  ON pit.HashKey = sat2.HashKey AND pit.LoadDate2 = sat2.LoadDate
LEFT JOIN Concept_SATST_source1 st1 
  ON pit.HashKey = st1.HashKey AND pit.LoadDateST1 = st1.LoadDate
LEFT JOIN Concept_SATST_source2 st2 
  ON pit.HashKey = st2.HashKey AND pit.LoadDateST2 = st2.LoadDate

This query structure ensures that all available data is consolidated and evaluated according to business-specific logic to determine the final dimension state.

Final Thoughts

Handling soft deletes in a Data Vault using status tracking satellites is a robust and scalable solution for real-world enterprise systems. The key is to treat these satellites as regular descriptive tables, include them in your PIT tables, and let business logic drive how deletions are interpreted in the dimension views.

Instead of deleting records from your dimensions—which can break fact table relationships—simply flag them. This gives you the full power of historical traceability while still providing clear information about current entity status. As always, your implementation should follow the business rules defined by your organization or your client.

Watch the Video

Meet the Speaker

Profile picture of Michael Olschimke

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

The Data Vault Handbook

Build your path to a scalable and resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read the Book Now

Leave a Reply

Close Menu