Refactoring a Data Vault Model: Options, Risks, and Best Practices
Source systems change. Columns get added or removed, structures evolve, and sometimes entire business key definitions are overhauled. When that happens to a system feeding your Data Vault, the question isn’t just technical — it’s strategic. Do you modify what you have, or do you build alongside it? This post walks through the main scenarios and the practical options available for each, along with a clear recommendation on where to draw the line between low-risk and high-risk approaches.
In this article:
- When a Column Changes: The Simple Case
- When the Business Key Changes: The Complex Case
- Option A: Keep Old and New Structures Separate
- Option B: Refactor the Raw Data Vault
- Handling Non-Unique Business Keys in the Interim
- Communication: The Overlooked Part of Refactoring
- The Bottom Line on Refactoring
- Watch the Video
When a Column Changes: The Simple Case
The least disruptive scenario is a column-level change in a source table — a new attribute appears, or an existing one disappears. For this, you have a few options depending on your project constraints.
Option 1 — Modify the existing Satellite. If your project allows structural changes, you can add the new column to the existing Satellite with an ALTER TABLE statement. Historical rows will show null values for the new column before its introduction, and a log entry can record exactly when the column was added. Removing a column from a Satellite is generally not done — historical data lives in that column, and dropping it means losing that history.
Option 2 — Create a new Satellite. If you’re not allowed to touch existing structures, or simply prefer not to, you create a new Satellite to capture the new or changed attributes. This Satellite gets added to the relevant PIT Tables. The trade-off is an additional join in your queries, but the existing Satellite and its data remain completely untouched.
Option 3 — Close and replace the Satellite. A slightly more thorough approach: close the existing Satellite (stop loading it) and create a brand new one that reflects the updated structure. The new Satellite starts with a full load from the source, which means some data overlap with the old Satellite. This is handled cleanly at query time using an IIF statement — prefer data from the new Satellite where it exists, fall back to the old one for earlier history. The redundancy is not a problem; it resolves itself during query execution.
The bigger the structural change, the more this third option makes sense. If a source table is overhauled dramatically — many columns removed, many added — creating a fresh Satellite to capture the new shape is often the cleanest path forward.
When the Business Key Changes: The Complex Case
Column-level changes are manageable. Business key changes are where things get genuinely complex — and where the risk calculus shifts significantly.
A business key in Data Vault must be unique over time and across the enterprise. If the current key no longer meets that standard — say, a customer number that was once reliable is now duplicated across regions — you have a structural problem that can cascade through the model. Changing the business key means potentially changing the Hub itself, which in turn affects every Link that references that Hub, and every Satellite attached to those Hubs and Links. The impact can be wide.
At this point, you have two main strategic choices.
The Data Vault Handbook:
Core Concepts and Modern Applications
Build Your Path to a Scalable and Resilient Data Platform
The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.
Option A: Keep Old and New Structures Separate
The lower-risk approach — and the one most commonly recommended — is to leave the historical Raw Data Vault exactly as it is and build a new Raw Data Vault to capture data under the new structure and key definition.
The reasoning is rooted in a core Data Vault principle: the Raw Data Vault should model data close to how the source systems use it. The business had one structure in the past and a different one going forward. That’s two different realities, and it makes sense to model them separately.
The two Raw Data Vaults then get reconciled in the Business Vault, where business logic handles the combination of old and new data. This might be straightforward — a simple union — or it might be complex, especially if field definitions have changed. For example, if an address field was previously structured (street, house number, zip, city) and is now a free-text memo field that may contain addresses from multiple countries, the logic to normalize and combine that data belongs in the Business Vault. That’s exactly what the Business Vault is designed for.
This approach carries the lowest risk. Historical data is never touched. Nothing can go wrong with data that hasn’t been moved.
Option B: Refactor the Raw Data Vault
The more ambitious option is to refactor the existing Raw Data Vault into a new version — modifying Hubs, Links, and Satellites to reflect the new structure — and then reconstruct historical data within that new model.
This is technically possible, but it comes with a hard requirement: you must be able to reconstruct every historical delivery from the new structure without any data loss. In Data Vault practice, this is validated through what’s known as the “Jedi test” — deriving the old structures from the new ones and verifying the output matches the original data exactly. If the test passes, you can safely drop the old tables and replace them with views that expose the old structure as a backward-compatible interface.
Those views give existing queries time to continue working while users migrate. But they’re a transitional tool, not a permanent one. You’ll want to communicate a clear deprecation timeline — 90 or 180 days is typical — and give users explicit guidance on how to update their queries before the views are dropped.
A word of warning: when those views eventually get dropped, expect complaints. Not because the communication failed, but because, as a rule, nobody reads emails. Plan for it.
Handling Non-Unique Business Keys in the Interim
If a business key loses its uniqueness mid-project and a full refactoring effort will take several sprints, there’s a practical interim solution: a Record Source Tracking Satellite. This allows you to continue working with the existing model while the refactoring is planned and executed in the background. It buys time without requiring an immediate structural overhaul, and it keeps the data pipeline running cleanly during the transition.
The Data Vault Handbook:
Core Concepts and Modern Applications
Build Your Path to a Scalable and Resilient Data Platform
The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.
Communication: The Overlooked Part of Refactoring
Technical decisions aside, refactoring a Data Vault model is also an organizational event. Users who query your data warehouse need to know when structures change — whether that’s a modified Satellite, a new Hub, or a deprecated view that will be removed in three months.
A simple data warehouse changelog or newsletter goes a long way. When you modify existing entities, inform users. When you introduce views as backward-compatible bridges, tell them the timeline. When the views are going away, tell them what to query instead. This isn’t just good practice — it’s the difference between a smooth migration and a flood of support tickets.
The Bottom Line on Refactoring
Data Vault is built to absorb change, and it does so gracefully at the column level. Descriptive attribute changes — new columns, removed columns, restructured Satellites — are handled with well-defined options and minimal risk. The real challenge arrives when business keys change, because the ripple effects can touch Hubs, Links, and Satellites across the model.
In those cases, the recommended approach is to preserve historical data in the original Raw Data Vault and build a new one for the new structure, using the Business Vault as the reconciliation layer. It’s the lowest-risk path, it keeps your historical data intact, and it puts complex transformation logic exactly where it belongs.
To learn more about Data Vault modeling principles, refactoring strategies, and Business Vault patterns in depth, explore our Data Vault 2.1 Training & Certification. And if you’re new to the methodology, the free Data Vault handbook is a great starting point — available as a hard copy or digital download.