Deriving Dimensions from Reference Data

Watch the Video

Reference Data

Welcome to another episode of Data Vault Friday! I’m Michael Olschimke, CEO of Scalefree. Today’s question comes from our online form, and it’s about deriving dimensions from reference data in the raw Data Vault. Specifically, the questioner has several lookup reference tables that they add as Hub or reference tables while creating the raw Data Vault. For example, they have a region table that includes a region code, description, language code, and valid from/to dates.

In this article:

Understanding Degenerate Dimensions
Building the Model
Creating a Degenerate Dimension
Handling Time-Based Data
Performance Considerations
Conformed Dimensions
Implementation Steps
Conclusion

Understanding Degenerate Dimensions

The query mentions a “degenerated dimension.” To clarify, a degenerate dimension is a dimension attribute, such as the region code, included in a fact table without any additional descriptions. This attribute exists within the fact table itself and doesn’t have a separate dimension table.

The Data Vault Handbook:
Core Concepts and Modern Applications

Build Your Path to a Scalable and Resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

Building the Model

To illustrate this, let’s start with a basic structure. Imagine you have a non-historized link containing transaction data, and it references Hubs such as Hub Customer and Hub Product. Additionally, you have reference tables, like a region reference table with a region code and associated descriptions. Here’s a simplified model:

Non-historized Link: Contains transaction data.
Hub Customer and Hub Product: Reference customer and product data.
Reference Hub for Region: Contains the region code.
Reference Satellite for Region: Contains the descriptions, language codes, and valid dates.

This setup allows for capturing changes in reference data, making the model auditable and maintaining historical accuracy.

Creating a Degenerate Dimension

To create a degenerate dimension from the reference data, follow these steps:

Include the Code in the Fact Table: Add the region code directly into your transaction data (the non-historized link).
Determine the Required Attributes: Decide if you only need the region code or additional attributes like the region name.
Create a Fact View: If you only need the region code, simply create a fact view that includes this code.
Prejoin Additional Attributes: If you need additional attributes, prejoin the reference Hub and Satellite to get the region name or other details based on the timeline of your facts.

Handling Time-Based Data

When dealing with time-based data, it’s essential to identify the correct version of your reference data. If you want the latest description of the region (a Type 1 dimension), you can join the latest entry. For a Type 2 dimension (tracking changes over time), join based on the fact timestamp to match the correct version of the region name.

The Data Vault Handbook:
Core Concepts and Modern Applications

Build Your Path to a Scalable and Resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

Performance Considerations

Reference tables typically contain a relatively small amount of data, which allows most joins to be efficient. However, if performance becomes an issue, you can consider creating a Point-in-Time (PIT) table in the Business Vault. This table can precompute the current description for each region on a daily basis, making joins faster and more efficient.

Conformed Dimensions

If you prefer to use a conformed dimension, convert your reference table into a dimension table. Use the primary key of the reference table (e.g., region code) as the dimension identifier. This approach involves joining the reference Hub and Satellite to create a dimension view that can be used in your fact tables.

Implementation Steps

Turn Reference Table into Dimension: Join the reference Hub and Satellite to create a dimension view.
Use Reference Code as Dimension Key: The region code becomes the dimension key.
Create Fact View: Include the dimension key in your fact view and join the necessary attributes from the dimension view.
Configure in Dashboard: Set up relationships between your facts and dimensions in your dashboard application for seamless data visualization.

The Data Vault Handbook:
Core Concepts and Modern Applications

Build Your Path to a Scalable and Resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

Conclusion

In summary, deriving dimensions from reference data in a Data Vault involves understanding your needs for degenerate or conformed dimensions, handling time-based data appropriately, and ensuring efficient joins. By following these steps, you can create a robust and scalable data model that meets your analytical needs.

Thank you for joining us for this Data Vault Friday session. If you have more questions, submit them at sfr.ee/dvfriday. For additional learning, check out our webinars at Scalefree.to/webinars. If you need answers before next Friday, visit the Data Vault Innovators community we set up with Ignition Data.

Until next time, keep those data questions coming, and we’ll see you next Friday!

Deriving Dimensions from Reference Data

Watch the Video

Reference Data

Understanding Degenerate Dimensions

The Data Vault Handbook:
Core Concepts and Modern Applications

Building the Model

Creating a Degenerate Dimension

Handling Time-Based Data

The Data Vault Handbook:
Core Concepts and Modern Applications

Performance Considerations

Conformed Dimensions

Implementation Steps

The Data Vault Handbook:
Core Concepts and Modern Applications

Conclusion

Leave a Reply Cancel Reply

Build Better Data Platforms

SOLUTIONS

TRAINING

EVENTS

KNOWLEDGE HUB

CAREERS

COMPANY

Make Better Salesforce Decisions

Build Better Data Platforms

Deriving Dimensions from Reference Data

Watch the Video

Reference Data

Understanding Degenerate Dimensions

The Data Vault Handbook: Core Concepts and Modern Applications

Building the Model

Creating a Degenerate Dimension

Handling Time-Based Data

The Data Vault Handbook: Core Concepts and Modern Applications

Performance Considerations

Conformed Dimensions

Implementation Steps

The Data Vault Handbook: Core Concepts and Modern Applications

Conclusion

You May Also Like

Set Based Multi-Active Satellite Derived From Record Level Multi Active Satellite

HL7 FHIR resources in Data Vault

Joining SCD2 Tables Using Data Vault

Leave a Reply Cancel Reply

Build Better Data Platforms

SOLUTIONS

TRAINING

EVENTS

KNOWLEDGE HUB

CAREERS

COMPANY

The Data Vault Handbook:
Core Concepts and Modern Applications

The Data Vault Handbook:
Core Concepts and Modern Applications

The Data Vault Handbook:
Core Concepts and Modern Applications