Skip to main content
search
0

Loading Historical Data in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our BI Consultant Julian Brunner delves into a question from the audience that addresses a common challenge.

“One of our sources delivers all the historical data in one batch. So all the records have the same load date. How can I load the data into the EDW properly?”

In this insightful video, Julian shares practical solutions and strategies for loading historical data into an Enterprise Data Warehouse (EDW) when faced with the unique scenario of receiving all records with the same load date. The question prompts a discussion on best practices to ensure proper handling and integration of historical data within the Data Vault framework.

Julian provides valuable insights into the considerations and steps involved in effectively managing historical data loads, offering guidance on maintaining data integrity and accuracy within the EDW.

Data Vault 2.0 Source System Disaster Recovery

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke engages with a challenging question from our audience, aiming to find an elegant solution to a complex scenario.

“I’m trying to find an elegant way of addressing the following problem.

You have a DV2.0 Insert Only BI deployment fed by multiple OLTP systems. One of these OLTP systems will be subject to a disaster and associated recovery process. This will be done with a loss of 3h worth of data from the OLTP in question. During the 3 hours, multiple loads into the DV were completed.

I’m trying to avoid an effectivity satellite for each hub.”

In this insightful video, Michael explores strategies for handling data from multiple source systems with disaster considerations in a Data Vault 2.0 Insert Only BI deployment. The question prompts a discussion on avoiding the use of an effectivity satellite for each hub, offering alternative approaches to address the challenges posed by data loss during disaster recovery.

Michael shares practical insights and considerations for designing resilient solutions within the Data Vault framework while optimizing the balance between complexity and efficiency.

Data Vault 2.0 Project Tracking

Watch the Video

In our continuous Data Vault Friday series, our CEO Michael Olschimke addresses a pertinent question from our audience regarding the application of Scrum in Data Vault 2.0.

“We are struggling with the application of Scrum in Data Vault 2.0: the Kanban board is overloaded with technical user stories. However, in theory, the user stories should be oriented towards the business and user needs.”

In this insightful video, Michael delves into the challenges faced when integrating Scrum methodologies into Data Vault 2.0 projects, particularly the issue of an overloaded Kanban board with technical user stories. The question prompts a discussion on the alignment of user stories with business and user needs, emphasizing the importance of maintaining a business-centric focus.

Michael shares practical insights and recommendations for optimizing the use of Kanban boards in Data Vault 2.0 projects, ensuring a balance between technical requirements and business-oriented user stories.

Bring Your Data Vault Automation to the Next Level with DataVault4coalesce

SALESFORCE SOLUTIONS

Data Vault Automation with DataVault4coalesce

A cooperation created DataVault4coalesce, an open source extension package for coalesce.io. In a previous webinar, we explored coalesce.io, a new competitor in the highly contested market of data warehouse automation tools.

coalesce

Level up your Data Vault automation – with DataVault4coalesce

Coalesce is a modern, column-aware data warehouse automation tool. In this webinar, you will learn how Scalefree’s latest publication brings best practices out of the Data Vault world into your coalesce.io experience. This includes data loading patterns, data vault related features, and more! All embedded into easy-to-use UI options to make use of Coalesce’s unique configurable user interface. Tune in to see DataVault4coalesce in action!

Watch Webinar Part 1Watch Webinar Part 2

And everyone who watched that webinar might remember that at the end, we announced an even closer relationship between coalesce.io and Scalefree and a commitment to bring Scalefree’s best practices into coalesce.io!

For those who didn’t watch the webinar, you can find it here.

Recap: What is Coalesce?

coalesce.io is a Data Transformation solution made for Snowflake. When working with Coalescse, you build directed acyclic graphs (DAG) which contain nodes. A node represents any physical database object, like tables or views, or even stages and external tables.

Coalesce itself is built around metadata that stores table and column-level information, which describes the structure inside your data warehouse. This metadata-focused design enables a couple of features that strongly drive towards scalability and manageability. 

All the metadata allows a team to track past, current, and desired states of the data warehouse by deeply integrating it and all the workflows that it brings. Additionally, users can define standardized patterns and templates on column- and table-level.

How can Data Vault jump in here?

These mentioned templates open up the gate to implement Data Vault 2.0 related patterns and best practices. Especially on the table level, it might quickly come to mind that you could try to build a template for a Hub or a Link.

On column level, this could be a repeated transformation which is then managed in only one so-called macro, which makes it very easy to implement changes with low to zero impact. You could think of hash key calculation or virtual load-end-dating here.

And that is exactly what we at Scalefree have done since the webinar last year. Lead developers from coalesce.io sat together with Data Vault experts and developers from Scalefree with one goal: Create something amazing that helps users to automate their Data Vault 2.0 implementation!

Datavault4Coalesce

How fast can I build a Data Vault? Yes!

This cooperation created DataVault4coalesce, an open-source extension package for coalesce.io, which will be available on March 16th! Let’s have a sneak peek at a selection of what users can do with DataVault4coalesce.

The first release of DataVault4coalesce will feature a basic set of Data Vault 2.0 entities:

While providing DDL and DML templates for the entity types mentioned above, DataVault4coalesce makes use of Coalesce’s ability to define the UI for each node type. For stages, this means that users can decide if they want DataVault4coalesce to generate ghost records automatically or not, as shown in the screenshot below:

Datavault4Coalesce

This Data Vault related interface can be found across all node types and allows users to adjust DataVault4coalesce to fit their requirements conveniently!

Conclusion

First of all a bummer, DataVault4coalesce will only be available starting from the 16th of March. But there is no reason to wait that long to dive into coalesce.io itself! Since it is now part of Snowflake Partner Connect, it’s never been easier to get your hands on a fresh coalesce.io environment!

Just sign up for a free Snowflake trial here and initialize your coalesce.io experience within seconds by accessing the Partner Connect portal! Then, you just have to load any desired data into it, and you can start building your data pipelines with coalesce.io. And when the 16th of March finally arrives, you just have to add DataVault4coalesce to your coalesce.io environment – and now you can start to build Data Vault faster than ever!

Also, don’t miss out on this recording, where we will show you DataVault4coalesce in action. Watch it here!

(Logical) Information Marts in Data Vault

Watch the Video

In our continuous Data Vault Friday series, our CEO Michael Olschimke addresses a question from our audience that delves into the intricacies of the CDVP2 training.

“We are having trouble understanding the attached slide 28 of the CDVP2 training.

– What is the difference between Business DV Pits & Bridges and Pits & Bridges?
– We are confused about why Business Vault and Info Mart are put into one logical wrapper. Why does physical and logical wrapper differentiate?”

In this elucidating video, Michael provides clarification on the distinctions between “raw” and “business” Point-in-Time (PIT) and bridge tables. The question prompts a discussion on understanding the nuances of these components within the Data Vault methodology.

Michael shares insights into the reasoning behind grouping Business Vault and Info Mart into one logical wrapper while emphasizing the differentiation between physical and logical wrappers. The discussion provides valuable context for participants seeking clarity on the CDVP2 training material.

Why Data Vault 2.0 Is the Best Data Model for Automation

Watch the Webinar

Many data teams worry that automation won’t work on their specific data and technology stack. They’ve learned the hard way that automation doesn’t always stand up to the complexity of different source data models, taxonomies, and tech stack components.
Join this webinar to understand how Data Vault 2.0 is designed to focus on models and logic, not complex code so that it’s rapidly becoming the DWH standard.

We’ll explain how Data Vault has taken the best of the more traditional modeling
approaches, such as Inmon or Kimball, to provide the level of abstraction, quality, and agility that automation requires.

You’ll learn how the Data Vault model and its methodology and architecture leverage
automation. And how we use integration templates based on Data Vault standards to pave the way to fully automated data loading.

This webinar takes you from theory to practice.

Watch Webinar Recording

Webinar Agenda

1. The pros and cons of different data modeling techniques.
2. The prerequisites for automation.
3. Why Data Vault works best.
4. How to create abstractions in data warehousing.
5. Demo: how it’s applied in VaultSpeed.

Supersetting in Data Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke engages with a thoughtful inquiry from our audience.

“Dear Scalefree team, we receive data from the source for multiple company forms (like HoldingCompany, JointVenture), and we want to know if it’s recommended to save them in different entities (e.g., HoldingCompany_h/s, JointVenture_h/s) or one big entity (Company_h/s).

If we split them, we will have for each company form (e.g., Holding Company) about 10 links; If we store everything in one Company entity, we may face the situation that different company forms have different master data in the future, besides, it violates the Data Vault 2.0 rule that we should save the data as delivered by the source.”

In this insightful video, Michael delves into the strategic considerations of applying sub-setting and super-setting in the context of Data Vault 2.0. The question prompts a discussion on where to employ these techniques and the potential exceptions that might arise from the default strategy.

Michael provides practical insights and recommendations for effectively handling diverse company forms within the Data Vault framework, ensuring compliance with Data Vault 2.0 principles while addressing the complexities of master data variations.

Reference Table Vs. Reference Hub in Data Vault

Watch the Video

In this week’s Data Vault Friday, our CEO Michael Olschimke addresses an intriguing question from our audience regarding the difference between a Reference Table and a Reference Hub.

“If I need to historize the reference table, I can use the Satellite pattern. Ok, I have now a Reference Satellite table. But what about the Reference Hub table? Is it effective to create a table with just one column?”

In this informative video, Michael explores the concept of historizing reference tables within Scalefree‘s Data Vault 2.0 projects. The question specifically focuses on the efficiency and effectiveness of creating a Reference Hub table with just one column.

Michael shares insights into the considerations and scenarios where creating a Reference Hub table with a single column can be a viable and effective approach. The discussion provides practical guidance for handling reference tables within the Data Vault 2.0 methodology.

Calculating Hash Keys in Business Vault

Watch the Video

In our ongoing Data Vault Friday series, our CEO Michael Olschimke delves into a thought-provoking question from our audience.

“When calculating hash_key in links in Business Vault, it sometimes can be quite expensive to join all hubs to get the business keys, etc. In many cases, we keep those hash_keys to keep the standards only. And even for any case where you may need to build a satellite for that link, that means you would have the same granularity. So is it still a no-go to generate the link hash_key from the hub hash_keys to prevent expensive joins in some cases? If so, what do you suggest?”

In this insightful video, Michael addresses the considerations and challenges related to calculating hash keys in links within the Business Vault. The question prompts a discussion on the trade-offs between keeping hash keys for standards and the potential expense of joins, especially when dealing with multiple hubs.

Michael shares his expertise on hashing practices in Data Vault 2.0 links, offering recommendations and considerations to optimize the balance between standards and performance in the Business Vault.

Top 10 Salesforce Features – 2023 (German)

Watch the Webinar

Entdecke die neuesten Entwicklungen für Salesforce mit dem Spring ’23 Update! Unser Team hat die Release-Notes genau durchgearbeitet, um dir die besten neuen Funktionen vorzustellen, die jetzt in deiner Organisation verfügbar sind. Komm an Bord und erfahre, wie du diese Tools nutzen kannst, um deine Arbeitsabläufe zu optimieren und deine Effizienz zu steigern. Nutze die Chance, um dein Wissen über Salesforce zu erweitern und deine Fähigkeiten nachhaltig zu verbessern.

Watch Webinar Recording

Webinar Agenda

1. Top 10 bis 4
2. Top 3 im Detail
3. Ausblick und Q & A

Speed Up Your Data Vault 2.0 Implementation with Turbovault4DBT

TurboVault4dbt Logo

TurboVault4dbt

Scalefree released TurboVault4dbt, an open-source package to automate model generation using DataVault4dbt-compatible templates based on your sources’ metadata.

TurboVault4dbt currently supports metadata input from Excel, GoogleSheets, BigQuery, and Snowflake and helps your business with:

  • Speeding up the development process, reducing development costs, and producing faster results
  • Encouraging users to analyze and understand their source data

Speed up Your Data Vault 2.0 Implementation – with TurboVault4dbt

This webinar delves into TurboVault4dbt, an open-source tool by Scalefree that speeds up Data Vault 2.0 implementation. It automates dbt model creation using your source metadata, saving time and costs while encouraging better data analysis.

TurboVault4dbt works with metadata inputs like Excel, Google Sheets, BigQuery, and Snowflake, generating models for hubs, links, and satellites automatically. Just set up your metadata tables, connect the tool, and watch it do the heavy lifting!

Watch webinar recording

‘Isn’t every model kind of the same?’

Datavault4dbt is the result of years of experience in creating and loading Data Vault 2.0 solutions forged into a fully auditable solution for your Data Vault 2.0 powered Data Warehouse using dbt.

But every developer who has worked with the package or has created dbt models for the Raw Vault must have come across one nuisance:

Creating a new dbt model for a table means taking the already existing template and providing it with specific metadata for that table. Doing this over and over again can be quite a chore. This is why we created TurboVault4dbt to automate and speed up this process.

From CTRL+C AND CTRL+V to a simple mouse-click

How many times has everyone pressed CTRL+C then CTRL+V and corrected a few lines of code when creating new dbt-models for the raw vault?

Instead of trying to figure out what the names of your tables and business keys are or what hashing order you want your Hashkey to be generated in, TurboVault4dbt will do all of that for you. All TurboVault4dbt needs is a metadata input where you capture the structure of your data warehouse.

TurboVault4dbt

TurboVault4dbt currently requires a structure of five metadata tables:

  • Hub Entities: This table stores metadata information about your Hubs,
    e.g. (Hub Name, Business Keys, Column Sort Order for Hashing, etc.)
  • Link Entities: This table stores metadata information about your Links,
    e.g. (Link Name, Referenced Hubs, Pre-Join Columns, etc.)
  • Hub Satellites: This table stores metadata information about your Hub Satellites,
    e.g. (Satellite Name, Referenced Hub, Column Definition, etc.)
  • Link Satellites: This table stores metadata information about your Hub Satellites,
    e.g. (Satellite Name, Referenced Link, Column Definition, etc.)
  • Source Data: This table stores metadata information about your Sources,
    e.g. (Source System, Source Object, Source Schema, etc.)

By capturing the metadata in those five tables above, TurboVault4dbt can extract necessary information and generate every model that is based on a selected source but also, as a user, encourage you to analyze and understand your data.

Conclusion: Lean back, relax and let TurboVault4bdt take over!

Create and fill your metadata tables, connect them to TurboVault4dbt, and enjoy your free time for another cup of coffee. Give it a try, or give us your feedback by visiting TurboVault4dbt on GitHub!

Stay updated on TurboVault4dbt through our marketing channels as great features lie ahead!

PIT Table Structure in Data Vault

Watch the Video

In our continuous Data Vault Friday series, our CEO Michael Olschimke engages with an insightful question from our audience.

“Is it possible to add business keys and/or descriptive attributes to a Point-in-Time (PIT) table to improve performance when filtering or joining data in the information mart?”

In this concise yet informative video, Michael delves into the consideration of enhancing the performance of filtering or joining data in the Information Mart by incorporating business keys and descriptive attributes into a PIT table. The question prompts a discussion on the circumstances and scenarios where denormalizing these elements into a PIT table may be beneficial.

Michael shares practical insights and considerations, providing clarity on when and how the inclusion of business keys and descriptive attributes in a PIT table can contribute to improved performance in data retrieval and analysis within the Information Mart.

Close Menu