Skip to main content
All Posts By

Lina Sibbel

Bring your Data Vault automation to the next level – with Datavault4Coalesce

By Scalefree Newsletter No Comments

Join the Webinar:

In last September’s webinar, we explored a new competitor in the highly contested market of data warehouse automation tools: Coalesce!

And everyone who watched that webinar might remember that at the end, we announced an even closer relationship between Coalesce and Scalefree and a commitment to bring Scalefree’s best practices into Coalesce!

For those who didn’t watch the webinar, you can find it here.

Recap: What is Coalesce?

Coalesce is a Data Transformation solution made for Snowflake. When working with Coalescse, you build directed acyclic graphs (DAG) which contain nodes. A node represents any physical database object, like tables or views, or even stages and external tables.

Coalesce itself is built around metadata that stores table and column level information which describes the structure inside your data warehouse. This metadata-focused design enables a couple of features which strongly drive towards scalability and manageability. 

All the metadata allows a team to track past, current and desired states of the data warehouse by deeply integrating it and all the workflows that it brings. Additionally users can define standardized patterns and templates on column- and table-level.

How can Data Vault jump in here?

These mentioned templates open up the gate to implement Data Vault 2.0 related patterns and best-practices. Especially on table level, it might quickly come to mind that you could try to build a template for a Hub or a Link.

On column level, this could be a repeated transformation which is then managed in only one so-called macro, which makes it very easy to implement changes with low to zero impact. You could think of hashkey calculation or virtual load-end-dating here.

And that is exactly what we at Scalefree did since the webinar last year. Lead-developers from Coalesce sat together with Data Vault experts and developers from Scalefree with one goal: Create something amazing that helps users to automate their Data Vault 2.0 implementation!

How fast can I build a Data Vault? Yes!

This cooperation created Datavault4Coalesce, an open source extension package for Coalesce, which will be available on March 16th! Let’s have a sneak peek at a selection of what users can do with Datavault4Coalesce.

The first release of Datavault4Coalesce will feature a basic set of Data Vault 2.0 entities:

  • Stages
  • Hubs
  • Links
  • Satellites
  • PITs
  • Snapshot Control Tables

While providing DDL and DML templates for the entity types mentioned above, Datavault4Coalesce makes use of Coalesce’s ability to define the UI for each node type. For stages, this means that users can decide if they want Datavault4Coalesce to automatically generate ghost records or not, as shown in the screenshot below:

This Data Vault related interface can be found across all node types and allow users to conveniently adjust Datavault4Coalesce to fit their requirements!

How can I start?

First of all a bummer, Datavault4Coalesce will only be available starting from the 16th of March. But there is no reason to wait that long to dive into Coalesce itself! Since Coalesce is now part of Snowflake Partner Connect, it’s never been easier to get your hands into a fresh Coalesce environment!

Just sign up for a free Snowflake trial here and initialize your Coalesce experience within seconds by accessing the Partner Connect portal! Then, you just have to load any desired data into it and you can start building your data pipelines with Coalesce. And when the 16th of March finally arrives, you just have to add Datavault4Coalesce to your Coalesce environment – and now you can start to build Data Vault faster than ever!

Also, don’t miss out our next webinar which will be held at the day of launch, the 16th of March, where we will show you Datavault4Coalesce in action. Sign up here!

Get Updates and Support

Please send inquiries and feature requests to [email protected]

For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.

To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.

Scalefree
TurboVault4dbt Logo

Speed up Your Data Vault 2.0 Implementation – with TurboVault4dbt

By Scalefree Newsletter No Comments

Join the Webinar:

Scalefree has released TurboVault4dbt, an open source package that provides automated model generation using DataVault4dbt-compatible templates based on your sources’ metadata.

TurboVault4dbt currently supports metadata input from Excel, GoogleSheets, BigQuery and Snowflake and helps your business with:

  • Speeding up the development process, reducing development costs and producing faster results
  • Encouraging users to analyze and understand their source data
dbt Logo

‘Isn’t every model kind of the same?’

Datavault4dbt is the result of years of experience in creating and loading Data Vault 2.0 solutions forged into a fully auditable solution for your Data Vault 2.0 powered Data Warehouse using dbt.

But every developer who has worked with the package or has created dbt-models for the Raw Vault must have come across one nuisance:
Creating a new dbt-model for a table basically means taking the already existing template and providing it with specific metadata for that table. Doing this over and over again can be quite a chore. Which is why we created TurboVault4dbt to automate and speed up this process.

FROM CTRL+C AND CTRL+V TO A SIMPLE MOUSE-CLICK

How many times has everyone pressed CTRL+C then CTRL+V and corrected a few lines of code when creating new dbt-models for the raw vault?

Instead of trying to figure out what the names of your tables and business keys are or what hashing order you want your Hashkey to be generated in, TurboVault4dbt will do all of that for you. All TurboVault4dbt needs is a metadata input where you capture the structure of your data warehouse.

TurboVault4dbt currently requires a structure of five metadata tables:

  • Hub Entities: This table stores metadata information about your Hubs,
    e.g. (Hub Name, Business Keys, Column Sort Order for Hashing, etc.)
  • Link Entities: This table stores metadata information about your Links,
    e.g. (Link Name, Referenced Hubs, Pre-Join Columns, etc.)
  • Hub Satellites: This table stores metadata information about your Hub Satellites,
    e.g. (Satellite Name, Referenced Hub, Column Definition, etc.)
  • Link Satellites: This table stores metadata information about your Hub Satellites,
    e.g. (Satellite Name, Referenced Link, Column Definition, etc.)
  • Source Data: This table stores metadata information about your Sources,
    e.g. (Source System, Source Object, Source Schema, etc.)

By capturing the metadata in those five tables above, TurboVault4dbt can both extract necessary information and generate every model that is based on a selected source but also, as a user, encourage you to analyze and understand your data.

Lean back, relax and let the machine take over!

Create and fill your metadata tables, connect them to TurboVault4dbt and enjoy your free time for another cup of coffee. Give it a try, or give us your feedback by visiting TurboVault4dbt on GitHub!

Stay updated on TurboVault4dbt through our marketing channels as great features lie ahead!

Get Updates and Support

Please send inquiries and feature requests to [email protected]

For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.

To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.

Scalefree

ABOUT INFORMATION MARTS IN DATA VAULT 2.0

By Scalefree Newsletter No Comments

Join the Webinar:

In the Data Vault 2.0 architecture, information marts are used to deliver information to the end-users.
Conceptually, an information mart follows the same definition as a data mart in legacy data warehousing. However, in legacy data warehousing, a data mart is used to deliver useful information, not raw data. This is why the data mart has been renamed in Data Vault 2.0 to better reflect the use case.

“Classical” information marts

But the definition of information marts has more facets. In the book “Building a Scalable Data Warehouse with Data Vault 2.0” we present three types of marts:

  • Information marts: used to deliver information to business users, typically via dashboards and reports.
  • Metrics Mart: used in conjunction to a Metrics Vault, which captures EDW log data in a Data Vault model. The Metrics Mart is derived from the Metrics Vault to present the metrics in order to analyze performance bottlenecks or in resource consumption of power users and data scientists in managed self-service BI solutions.
  • Error Mart: stores those records that typically fail a hard rule when loading the data into the enterprise data warehouse.

Additional information marts

In addition to these “classical” information marts, we use additional ones in our consulting practice:

  • Interface Mart: this is more or less an information mart, however, the information is not delivered to a human being, e.g. via a dashboard or report. Instead, it is delivered to a subsequent application, or as a write-back, to the source system (for example when using the enterprise data warehouse for data cleansing).
  • Quality Mart: the quality mart is again an information mart, but instead of cleansing bad data, it is used to report bad data. Essentially, it turns the business logic used to cleanse bad data upside down: only bad data, in addition to ugly data at times, is delivered to the end-user, the data steward. This is often done in conjunction with data cleansing frontends where the data steward can either correct source data or comment and tag the exceptions.
  • Source Mart: again an information mart, but this time not using one of the popular schemas, such as star schemas, snowflake schemas or fully denormalized schemas. Instead, the information mart uses the data model of the source application, similar to an operational data store (ODS) schema. However, the Source Mart is not a copy of the data, it is a virtualized model on top of the Data Vault model, reflecting the original structures. It’s great for ad-hoc reporting and provides great value for many data scientists as well as power users.

This concludes our list of information marts. We have used them successfully in projects for our clients to better communicate the actual application of the information marts in their organization.

Let us know in the comments if you think this is helpful for you, too!

Get Updates and Support

Please send inquiries and feature requests to [email protected]

For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.

To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.

Scalefree
Banner datavault4dbt Logo

Kick-start your Data Vault 2.0 Implementation – with datavault4dbt

By Scalefree Newsletter No Comments

Join the Webinar:

Scalefree has released datavault4dbt. An open source package, that provides best-practice loading templates for Data Vault 2.0 entities, embedded into the open source data warehouse automation tool dbt.

Datavault4dbt currently supports Snowflake, BigQuery and Exasol and comes with a lot of great features:

  • A Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt
  • Ready for both Persistent Staging Areas and Transient Staging Areas, due to the allowance of multiple deltas in all macros, without loosing any intermediate changes
  • Creating a centralized, snapshot-based Business interface by using a centralized snapshot table supporting logarithmic logic
  • Optimizing incremental loads by implementing a high-water-mark that also works for entities that are loaded from multiple sources
dbt Logo

Building a Data Vault 2.0 Solution – made easy

The overall goal of releasing Data Vault 2.0 templates for dbt is to combine our years of experience in creating and loading Data Vault 2.0 solutions into publicly available loading patterns and best practices for everyone to use. Out of this ambition, datavault4dbt, an open source package for dbt was created and will be maintained by the Scalefree expert team. 

The most valuable characteristic of datavault4dbt is that it carnates the original Data Vault 2.0 definition by Dan Linstedt. It represents a fully auditable solution for your Data Vault 2.0 powered Data Warehouse. With a straight-forward, standardized approach, it enables the team to conduct agile development cycles.

By allowing multiple increments per batch while loading each Data Vault entity type, datavault4dbt supports both Persistent and Transient Staging Areas without losing any intermediate changes. These incremental loads are even optimized by implementing a dynamic high-water-mark that even works when loading an entity from multiple sources.

Additionally, datavault4dbt encourages strict naming conventions and standards by implementing a variety of global variables that span across all Data Vault layers and supported Databases. The process of end-dating data is completely virtualized to ensure a modern insert-only approach that avoids updating data.

With all these features, datavault4dbt is the perfect solution for your modern Big Data Enterprise Data Warehouse.

From the Stage over the Spine into the PITs

To achieve all this, we worked hard on creating a solid and universal staging area. All hashkeys and hashdiffs are calculated here and users are given the option to add derived columns, generate prejoins with other stages and add ghost records to their data. All of this highly automated based on parameterized user input. 

Based on staging areas, the Data Vault 2.0 spine can be created. Hubs, Links and Non-Historized Links can be loaded from multiple sources including mapping options to ensure business harmonization. 

This spine is then enriched by Standard Satellites, Non-Historized Satellites, Multi-Active Satellites and/or Record-Tracking Satellites. All of those that require it come with a version 0 for tables and a version 1 for end-dated views. 

Based on the Raw Data Vault, PITs can be created automatically, and their loading is backed by an automated, highly-configurable but optional logarithmic snapshot logic. This logic is included in the Control Snapshot Table, which also comes in two consecutive versions. To wrap the logarithmic snapshot logic up, a post-hook for cleaning up all PITs is included and comes in handy.

DataVault4dbt Logo

Start now and boost your Data Vault experience!

The lines above made you think “Nah, that’s all too good to be true!”? Convince yourself, or give us your highly appreciated feedback by visiting datavault4dbt on Github!

Of course, our future ambitions for datavault4dbt are high and next on our list are a lot of important topics, like:

  • Provide a detailed working example of datavault4dbt
  • Extend and migrate the existing documentation of the package
  • Support more and more databases
  • Add more advanced and specific Data Vault 2.0 entities
  • Develop automated Data Vault related tests
  • Review and implement user feedback and suggestions

Stay tuned for more datavault4dbt content on all our marketing channels!

Get Updates and Support

Please send inquiries and feature requests to [email protected]

For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.

To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.

Scalefree