Implement Data Transformations
In today’s data-driven world, organizations need robust, flexible, and scalable solutions to manage and transform their ever-growing volumes of data. Data Vault 2.0 has emerged as a leading methodology for designing enterprise data warehouses that are agile, auditable, and adaptive to change. When paired with powerful automation platforms like WhereScape, it becomes possible to implement these methodologies quickly and efficiently. In this article, we’ll explore where to implement data transformations prior to a Data Vault model in WhereScape and which types of transformations are permitted under Data Vault 2.0.
In this article:
Understanding Transformation Types in Data Vault
Data Vault categorizes transformations into two primary buckets:
- Hard Rules: Technical transformations that align data types and formats without altering the underlying meaning of the data.
- Soft (Business) Rules: Business-driven transformations that may change data meaning, adjust granularity, or apply interpretations to meet business requirements.
Hard Rules
Hard rules are the bedrock of a clean, consistent Raw Data Vault. They consist of simple SQL operations such as CAST
, TRIM
, normalization, and pre-join operations. These transformations must not change the semantic meaning of the data—they simply ensure that data types and formats are consistent before loading into the Raw Vault.
- Data Type Alignment: Converting text-based dates or numbers into proper
DATE
,TIMESTAMP
, or numeric types. - Data Format Standardization: Trimming whitespace, standardizing phone numbers or currency formats, normalizing text cases.
- Pre-join: Flattening complex or hierarchical source data by joining child tables to master tables using surrogate keys to obtain real business keys.
Soft Rules
Soft rules—also known as business rules—are applied later in the Data Vault workflow, typically in the Business Vault or downstream data marts. These rules implement business logic such as KPI calculations, trend analyses, or granular rollups. Unlike hard rules, soft rules may change data meaning, shift granularity, or introduce new interpretations.
- KPI Calculations: Computing metrics like year-to-date sales, customer churn rates, or profit margins.
- Data Interpretations: Categorizing transaction types, deriving risk scores, or labeling customer segments.
- Granularity Adjustments: Aggregating daily data to monthly summaries or drilling transaction data down to event-level detail.
Why Apply Hard Rules Before the Raw Vault?
The core principle of Data Vault 2.0 is to keep early layers of your data warehouse as raw and unaltered as possible. By applying only hard rules before the Raw Vault, you:
- Maintain Auditability: Raw data remains traceable back to its source, preserving lineage.
- Ensure Reusability: Raw Vault structures can serve multiple downstream use cases without pre-judging reporting requirements.
- Reduce Dependency: Soft rules—and their business dependencies—are deferred, minimizing changes that ripple through the entire pipeline.
Implementing Hard Rules in WhereScape
WhereScape’s automation capabilities allow you to define and apply hard rules seamlessly within both the 3D Data Model and the RED (Rapid ELT Development) modules. However, best practice is to centralize these transformations in the 3D model, where the metadata-driven approach can manage attribute-level rules efficiently.
1. Define Data-Type and Format Alignments
- In the Source Model, import your tables and define primary/foreign key relationships based on your source system metadata.
- Switch to the Data Vault Design layer (3D). Select the Hub, Link, or Satellite object and click the attribute you wish to transform.
- Under the Data Transformations tab, add a new transformation expression—e.g.,
CAST(birthdate AS DATE)
—ensuring you choose the correct SQL dialect for your target platform. - Specify the Source (schema or source system) if the rule varies by system.
2. Implementing Pre-Joins for Business Keys
When source tables only provide technical or surrogate keys, you can implement a “pre-join” to fetch the real business key before loading into the Raw Vault:
- In the Source Model, ensure foreign key relationships are defined (or manually create them by dragging and dropping).
- In the Data Vault Design, edit the Link object that originates from the child table (e.g.,
Orders
). - Under Source Mappings, add the parent table (e.g.,
Shippers
) as an additional source entity. - Map the parent’s business key attribute (e.g.,
CompanyName
) to the Link’s business key slot, and move the surrogateShipperID
to a Satellite attribute if desired. - When the model conversion runs, WhereScape will auto-generate the JOIN syntax in the staging area, fetching the business key for the Link’s hash-key computation.
Handling Complex Transformations
While WhereScape allows in-model transformations, it’s prudent to keep hard rules simple to avoid breaking the Raw Vault loading patterns. For more intricate logic—such as multi-step string parsing or nested conditional rules—consider creating dedicated pre-processing views:
- Create a database view (or staging table) on top of the raw source or staging tables.
- Implement your complex SQL logic in that view, ensuring it produces standardized output.
- Point your WhereScape models to these views as the source entities, treating them like any other source table.
Live Demo Walkthrough
In our webinar demo, we used the Northwind sample database in WhereScape 3D to showcase two scenarios:
- Birthdate Alignment: Converting the
Employee.BirthDate
fromDATETIME
toDATE
in the Hub.
Key Steps: SelectHub_Employee
in the Data Vault design, chooseBirthDate
, and addCAST(BirthDate AS DATE)
under Data Transformations. - Pre-Join for Shipper Business Key: Generating a
Link_Order_Shipper
that usesCompanyName
(fromShippers
) instead of the surrogateShipperID
.
Key Steps: In the Link’s Source Mapping, addShippers
as an additional source, mapCompanyName
to the Link’s business key, and moveShipperID
to Satellite.
Upon conversion and deployment:
- The staging objects automatically featured a JOIN between
Orders
andShippers
usingShipperID
→ShipperID
. - The physical Raw Vault tables contained the correct hash-keys and reference columns for the Link.
Best Practices and Tips
- Keep Hard Rules Simple: Avoid multi-layer nested SQL. If logic becomes unwieldy, move it to a view.
- Manage by Source: Use source-specific transformations if you ingest data from multiple systems with differing formats.
- Document Transformations: Leverage WhereScape’s metadata repository to annotate why each transformation exists.
- Test Incrementally: Validate each rule in isolation to ensure it doesn’t break downstream Raw Vault loads.
- Defer Business Logic: Always push soft rules to the Business Vault or reporting layers to maintain Raw Vault purity.
Conclusion
Implementing data transformations in a Data Vault architecture requires a clear separation between technical (hard) rules and business (soft) rules. By applying hard rules—such as data type alignment, format standardization, and pre-joins—prior to loading your Raw Data Vault, you ensure a clean, consistent foundation that remains adaptable to evolving business needs. WhereScape’s 3D and RED modules provide powerful, metadata-driven tools to define these transformations at the model level, automating the generation of staging logic and downstream objects.
With a disciplined approach—keeping hard rules simple, leveraging pre-processing views for complex logic, and deferring business rules to later layers—you’ll build a scalable, auditable, and high-performance Data Vault 2.0 implementation that stands the test of time.
Watch the Video
Meet the Speaker

Trung Ta
Senior Consultant
Trung has been Senior BI Consultant since 2019. As a Certified Data Vault 2.0 Practitioner at Scalefree, his area of expertise includes Data Warehousing in a cloud environment, as well as Data Vault 2.0 modeling and implementation – especially, but not limited to, with Wherescape 3D/RED. He’s been working with industry leaders in the insurance and finance sector, advising them on building their own Data Vault 2.0 solution.