Understanding the Role of Reference Data
In the world of modern data warehousing, reference data plays a crucial role in maintaining consistency and adding business context. But how should reference or master data be modeled effectively, especially within the framework of Data Vault 2.0 using WhereScape?
This article breaks down best practices and modeling techniques to help data engineers and architects manage reference data in a scalable, maintainable way.
In this article:
What is Reference Data?
Reference data consists of values that are used to categorize or describe other data within business systems. It typically includes code-to-description mappings that offer meaning to otherwise abstract identifiers.
Unlike business keys, which identify business entities or objects (like customers or products), reference data keys do not directly point to business objects. They simply support them with contextual information.
Examples of Reference Data:
- ISO codes for countries
- Official country names
- Continent or region classifications
- Currency types or codes
One critical aspect of reference data is that it can change over time. Country names may change, new currencies may be introduced, and existing classifications may be updated. This means that how we model and store this data must account for such changes.
Data Vault 2.0 and Reference Data Modeling
Data Vault 2.0 introduces a methodology designed for agility, auditability, and scalability in enterprise data warehousing. When dealing with reference data in this architecture, the recommended standard involves two main components:
- Reference Table (Reference Hub)
- Reference Satellite
Each serves a distinct purpose and helps manage both static and changing attributes efficiently.
The Reference Table (Reference Hub)
The reference table acts similarly to a hub in traditional Data Vault modeling, but with important distinctions:
- Contains reference codes or keys (e.g., ISO country code)
- Does not use a hash key – unlike typical hubs
- May include additional static attributes that do not change over time
This component provides a centralized location for managing consistent lookup values across the enterprise. While it’s technically referred to as a “hub,” it’s specialized for reference data and behaves slightly differently in structure and intent.
The Reference Satellite
Reference satellites extend the reference table to store attributes that may evolve over time. This aligns well with the Data Vault 2.0 philosophy of tracking change history and ensuring auditability.
Characteristics of a Reference Satellite:
- Includes reference codes or keys to link back to the reference table
- Stores descriptive attributes that may change over time (e.g., country name updates, new regional classifications)
This design allows data teams to accommodate both historical tracking and the dynamic nature of reference data.
Why Model Reference Data This Way?
There are several strategic and operational advantages to modeling reference data using this structure in Data Vault 2.0:
- Separation of concerns: Static and changing data are stored in different layers (table vs. satellite), improving data integrity.
- Scalability: Future changes in reference attributes or descriptions are easier to manage and don’t affect historical records.
- Auditability: Data Vault’s natural historization supports full lineage and change tracking, which is ideal for regulated industries.
- Adaptability: Requirements for historization may evolve over time. Modeling reference data into satellites regardless of current needs ensures readiness for future changes.
Best Practices for Implementation
When implementing this in WhereScape, which automates the Data Vault modeling process, follow these best practices:
1. Always Use a Reference Satellite
Even if you don’t need to historize now, model your reference data in a satellite. Future-proofing your model saves costly rework later.
2. Use Reference Hubs When Multiple Sources Exist
If your organization consumes reference data from multiple systems (e.g., two systems providing different descriptions for the same country code), a reference hub helps consolidate and align these variations around the same key.
3. Avoid Hash Keys in Reference Hubs
Because reference tables don’t represent business objects, there’s no need for a surrogate hash key. Stick with the natural reference code (e.g., “US” for United States) as your unique identifier.
4. Design Satellites for Change
Structure your reference satellites to easily accommodate attributes that may change. Make use of effective date fields to track the history of these changes.
Common Pitfalls to Avoid
- Modeling reference data as business hubs – this confuses context with core business entities
- Skipping the satellite – even when attributes are static today, change is inevitable
- Using hash keys unnecessarily – keep your design clean and minimal in reference structures
- Ignoring multiple source issues – consolidate differing descriptions with a reference hub
Conclusion
Modeling reference data correctly is a small but critical part of building a reliable, scalable, and auditable Data Vault. By following the recommended structure—a reference table paired with a reference satellite—you create a flexible and future-proof design.
WhereScape users benefit from automation, but understanding these modeling principles ensures you’re applying the tool in a way that aligns with industry best practices and prepares your warehouse for long-term success.
Whether you’re handling ISO country codes or global currency classifications, treat your reference data with the same care you would your core business entities—because it gives those entities their context.
Watch the Video
Meet the Speaker

Trung Ta
Senior Consultant
Trung has been Senior BI Consultant since 2019. As a Certified Data Vault 2.0 Practitioner at Scalefree, his area of expertise includes Data Warehousing in a cloud environment, as well as Data Vault 2.0 modeling and implementation – especially, but not limited to, with Wherescape 3D/RED. He’s been working with industry leaders in the insurance and finance sector, advising them on building their own Data Vault 2.0 solution.