Skip to main content
search
0
Scalefree Knowledge Webinars Data Vault Friday When to Use Reference Tables in Data Vault?

Reference Tables in Data Vault

In modern Data Vault 2.0 implementations, teams often face a recurring question: “We have dozens of small, static lookup tables—should we model them as full Hubs and Satellites, or can we use simpler reference tables?” If you’re dealing with Excel sheets containing tens or hundreds of rows of relatively stable data (like Profit Centers, Status Codes, or Region mappings), this article will help you decide when a lightweight reference table suffices—and when you need the auditability of a Hub/Satellite pattern.



Understanding Business Data vs. Reference Data

First, it helps to distinguish two broad categories of data:

  • Business Objects: Entities that your processes create and update constantly—Customers, Orders, Products, etc. You generate new keys and change descriptive attributes frequently.
  • Reference Data: Code lists and lookup tables that describe or classify business objects—Country codes, Profit Center codes, Contract types. These change infrequently and usually in small batches.

Although some tables can straddle the line (e.g., Profit Centers may be “business objects” for accounting teams), it’s often safe to treat truly stable code lists as reference data for modeling purposes.

Simple Reference Tables: Pros and Cons

A simple reference table in your Data Vault is nothing more than a flat table with:

  • Primary Key: Your reference code (e.g., profit_center_code).
  • Attributes: The 2–5 descriptive columns you need (e.g., profit_center_name, region).
  • No History: Only the current state is stored; updates overwrite existing rows.

Advantages: Easy to implement, minimal objects, straightforward joins at query time.
Disadvantages: No built-in historical tracking—updates will retroactively change past reports, and you cannot reconstruct previous descriptions.

When Simple Reference Tables Are Appropriate

Consider a flat reference table when:

  • Your business requirement only needs the latest values.
  • Updates are extremely rare (quarterly or less) and don’t require audit trails.
  • Performance of lookups is not mission-critical (small table sizes).
  • You have no regulatory or internal need to reproduce past descriptions.

If any of these criteria fail—especially auditability—then a simple reference table can become a liability.

Introducing Reference Hubs and Reference Satellites

To combine simplicity with history, use the Reference Hub & Reference Satellite pattern. This mirrors the standard Hub/Satellite design, but optimized for code lists:

  • Reference Hub:
    • business_key: the code (e.g., PROFCTR_001)
    • record_source: data origin
    • load_date: date the code list was loaded
  • Reference Satellite:
    • business_key (FK to Hub)
    • load_date (also part of PK)
    • Descriptive attributes (e.g., name, region, valid_from)

This approach captures every change to your reference data without overwriting, and still keeps your model lightweight.

How It Works in Practice

  1. Load the Hub: Insert every code once (or refresh if new codes appear).
  2. Load the Satellite: For each code, insert a new row whenever any descriptive attribute changes, tagging it with the load_date.
  3. Querying: In your dimension or Information Delivery layer, join from your business object Satellite (or Link) directly into the Reference Satellite on code, filtering to the row with the latest load_date ≤ transaction date.

Because reference tables are small, these joins remain performant even when you compare on dates.

Aligning Reference Data with Business Vault Snapshots

For organizations using a Business Vault layer with snapshot dates, you may need to “time-align” reference data. Two patterns are common:

  • On-the-fly alignment: In your reporting view, use the transaction’s snapshot_date and join to the Reference Satellite where load_date ≤ snapshot_date, picking the latest record.
  • PIT/Bridge tables: Precompute “Point-In-Time” (PIT) tables that store the reference code’s surrogate key aligned to each business object snapshot for faster querying.

Choose the pattern that balances your performance SLAs with data freshness requirements.

Auditability and Regulatory Compliance

If you operate in regulated industries (banking, telecom, government), audit trails are mandatory. The Reference Hub/Satellite pattern ensures:

  • Complete lineage and history of every code change.
  • Reproducibility of past reports with original reference descriptions.
  • Ability to support retrospective analyses without reloading or reconstructing data.

Even if your initial business users only ask for current values, future sprints or stakeholders may require historical context—so building auditability upfront can save costly refactoring.

Performance Considerations

Reference tables typically contain at most hundreds of rows. However, you should still consider:

  • Indexing: Ensure load_date and business_key are indexed for fast lookups.
  • Partitioning: Generally unnecessary for small tables but useful if your Satellite grows into thousands of deltas over years.
  • Join Strategy: In most SQL engines, joining a large transaction Satellite to a small Reference Satellite on code + latest date filter is efficient. If not, consider a PIT table.

Governance and Knowledge Transfer

Whatever pattern you choose, document and govern your reference data:

  • Maintain an authoritative data dictionary describing each code list, source, update frequency, and steward.
  • Set up automated tests (e.g., CI/CD validations) to detect unexpected code changes.
  • Implement alerts for large volumes of reference updates that may indicate data quality issues.

Decision Checklist

Use this quick checklist when evaluating a table for reference modeling:

  1. Is the table truly static or slow-changing? (Quarterly or less)
  2. Are there audit or historical requirements? (Regulatory or future use cases)
  3. Is the table small enough (< 1,000 rows) to avoid performance concerns?
  4. Do you need to reconstruct past reports with original descriptions?
  5. Would a simple change in the future (e.g., retro-active update) break historical reports if you used a flat table?

If you answered “yes” to questions 2 or 5, the Reference Hub/Satellite pattern is the safer choice. Otherwise, a simple reference table may be sufficient.

Conclusion

Static lookup tables in a Data Vault 2.0 implementation can be modeled either as simple reference tables or with a Reference Hub & Satellite pattern. While flat tables are easier to build, they lack historical tracking and auditability. By adopting the Reference Hub/Satellite approach, you gain full change history, reproducible reporting, and alignment with regulatory demands—while retaining a lightweight design.

Use the decision checklist above to guide your modeling choices, and ensure your Data Vault remains both agile and compliant as your organization’s needs evolve.

Watch the Video

Meet the Speaker

Profile picture of Michael Olschimke

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

The Data Vault Handbook

Build your path to a scalable and resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

Leave a Reply

Close Menu