Skip to main content
search
0
All Posts By

Tim Kirschke

Tim Kirschke is a Managing BI Consultant and Head of Internal Development at Scalefree. With a background in Applied Mathematics, he specializes in architecting auditable data solutions using Microsoft Fabric, Snowflake, and dbt. A dbt Certified Architect and CDVP2, Tim has led major warehouse implementations and conducts strategic workshops on data automation and enablement.

Enterprise Data Transformations with Turbovault and dbt Cloud

Watch Webinar Recording

Webinar Summary

Data Vault is vital for businesses due to its adaptability and scalability in managing dynamic data environments. Its hub-and-spoke architecture ensures traceability and agility, enabling quick adaptation to changing requirements and diverse data sources.

Come and join our upcoming webinar and learn about how to use dbt Cloud with Turbovault and a data modeling tool to implement data vault in your organization.

In this webinar you will

  • Receive a detailed 90-minute “show-and-tell” walkthrough of an end-to-end Data Vault implementation using cutting-edge tools
  • Explore the seamless integration of Ellie.ai, Turbovault4dbt, and Datavault4dbt for enhanced data modeling and automation
  • Understand the practical aspects of implementing a Data Vault without the need for a pre-configured demo environment.

Webinar Details

  • Date: 27th February
  • Time: 14:00 – 15:45 PM CET

Webinar Agenda

  1. Introduction to the Power Trio: dbt Cloud, Turbovault, and Data Modeling Tools
  2. 90-Minute “Show-and-Tell” Walkthrough of an End-to-End Data Vault Implementation
    • Using Ellie.ai for ER Model, Turbovault4dbt for dbt Automation, and Datavault4dbt for DV Model Generation
  3. Insights into Data Vault Implementation in Medium and Large Sized Companies
  4. Q&A Session with Industry Experts

In Partnership With

Bring Your Data Vault Automation to the Next Level with DataVault4coalesce

SALESFORCE SOLUTIONS

Data Vault Automation with DataVault4coalesce

A cooperation created DataVault4coalesce, an open source extension package for coalesce.io. In a previous webinar, we explored coalesce.io, a new competitor in the highly contested market of data warehouse automation tools.

coalesce

Level up your Data Vault automation – with DataVault4coalesce

Coalesce is a modern, column-aware data warehouse automation tool. In this webinar, you will learn how Scalefree’s latest publication brings best practices out of the Data Vault world into your coalesce.io experience. This includes data loading patterns, data vault related features, and more! All embedded into easy-to-use UI options to make use of Coalesce’s unique configurable user interface. Tune in to see DataVault4coalesce in action!

Watch Webinar Part 1Watch Webinar Part 2

And everyone who watched that webinar might remember that at the end, we announced an even closer relationship between coalesce.io and Scalefree and a commitment to bring Scalefree’s best practices into coalesce.io!

For those who didn’t watch the webinar, you can find it here.

Recap: What is Coalesce?

coalesce.io is a Data Transformation solution made for Snowflake. When working with Coalescse, you build directed acyclic graphs (DAG) which contain nodes. A node represents any physical database object, like tables or views, or even stages and external tables.

Coalesce itself is built around metadata that stores table and column-level information, which describes the structure inside your data warehouse. This metadata-focused design enables a couple of features that strongly drive towards scalability and manageability. 

All the metadata allows a team to track past, current, and desired states of the data warehouse by deeply integrating it and all the workflows that it brings. Additionally, users can define standardized patterns and templates on column- and table-level.

How can Data Vault jump in here?

These mentioned templates open up the gate to implement Data Vault 2.0 related patterns and best practices. Especially on the table level, it might quickly come to mind that you could try to build a template for a Hub or a Link.

On column level, this could be a repeated transformation which is then managed in only one so-called macro, which makes it very easy to implement changes with low to zero impact. You could think of hash key calculation or virtual load-end-dating here.

And that is exactly what we at Scalefree have done since the webinar last year. Lead developers from coalesce.io sat together with Data Vault experts and developers from Scalefree with one goal: Create something amazing that helps users to automate their Data Vault 2.0 implementation!

Datavault4Coalesce

How fast can I build a Data Vault? Yes!

This cooperation created DataVault4coalesce, an open-source extension package for coalesce.io, which will be available on March 16th! Let’s have a sneak peek at a selection of what users can do with DataVault4coalesce.

The first release of DataVault4coalesce will feature a basic set of Data Vault 2.0 entities:

While providing DDL and DML templates for the entity types mentioned above, DataVault4coalesce makes use of Coalesce’s ability to define the UI for each node type. For stages, this means that users can decide if they want DataVault4coalesce to generate ghost records automatically or not, as shown in the screenshot below:

Datavault4Coalesce

This Data Vault related interface can be found across all node types and allows users to adjust DataVault4coalesce to fit their requirements conveniently!

Conclusion

First of all a bummer, DataVault4coalesce will only be available starting from the 16th of March. But there is no reason to wait that long to dive into coalesce.io itself! Since it is now part of Snowflake Partner Connect, it’s never been easier to get your hands on a fresh coalesce.io environment!

Just sign up for a free Snowflake trial here and initialize your coalesce.io experience within seconds by accessing the Partner Connect portal! Then, you just have to load any desired data into it, and you can start building your data pipelines with coalesce.io. And when the 16th of March finally arrives, you just have to add DataVault4coalesce to your coalesce.io environment – and now you can start to build Data Vault faster than ever!

Also, don’t miss out on this recording, where we will show you DataVault4coalesce in action. Watch it here!

Coalesce and Data Vault 2.0 – A Perfect Match?

Watch the Webinar

This webinar introduces the data warehousing automation tool coalesce.io and how it can be used to create a Data Vault 2.0-powered data warehouse solution. You will see live demonstrations of the tool and the data vault entities.

Learn why Data Vault 2.0 is the perfect choice for date warehouse automation tools like Coalesce and how Coalesce can kickstart your Data Vault 2.0 solution!

Watch Webinar Recording

Webinar Agenda

1. Introduction to Coalesce
2. Demo Session
3. Introduction to Data Vault 2.0
4. Why Coalesce and Data Vault?
5. Demo Session

Kick-Start Your Data Vault 2.0 Implementation with Datavault4DBT

DataVault4dbt Powered by Scalefree

Datavault4dbt

Scalefree has released datavault4dbt. An open source package, that provides best-practice loading templates for Data Vault 2.0 entities, embedded into the open source data warehouse automation tool dbt.

Datavault4dbt currently supports Snowflake, BigQuery and Exasol and comes with a lot of great features:

  • A Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt
  • Ready for both Persistent Staging Areas and Transient Staging Areas, due to the allowance of multiple deltas in all macros, without loosing any intermediate changes
  • Creating a centralized, snapshot-based Business interface by using a centralized snapshot table supporting logarithmic logic
  • Optimizing incremental loads by implementing a high-water-mark that also works for entities that are loaded from multiple sources
dbt, Scalefree's partner

Kickstart your Data Vault 2.0 Implementation – with datavault4dbt

This webinar delves datavault4dbt, an open-source package by Scalefree that simplifies Data Vault 2.0 implementation in dbt. It provides best-practice templates for hubs, links, and satellites, ensures compliance with Data Vault standards, and supports flexible staging with optimized incremental loads, you won’t want to miss this webinar.

Watch webinar recording

Building a Data Vault 2.0 Solution – made easy

The overall goal of releasing Data Vault 2.0 templates for dbt is to combine our years of experience in creating and loading Data Vault 2.0 solutions into publicly available loading patterns and best practices for everyone to use. Out of this ambition, datavault4dbt, an open source package for dbt was created and will be maintained by the Scalefree expert team. 

The most valuable characteristic of datavault4dbt is that it carnates the original Data Vault 2.0 definition by Dan Linstedt. It represents a fully auditable solution for your Data Vault 2.0 powered Data Warehouse. With a straight-forward, standardized approach, it enables the team to conduct agile development cycles.

By allowing multiple increments per batch while loading each Data Vault entity type, datavault4dbt supports both Persistent and Transient Staging Areas without losing any intermediate changes. These incremental loads are even optimized by implementing a dynamic high-water-mark that even works when loading an entity from multiple sources.

Additionally, datavault4dbt encourages strict naming conventions and standards by implementing a variety of global variables that span across all Data Vault layers and supported Databases. The process of end-dating data is completely virtualized to ensure a modern insert-only approach that avoids updating data.

With all these features, datavault4dbt is the perfect solution for your modern Big Data Enterprise Data Warehouse.

From the Stage over the Spine into the PITs

To achieve all this, we worked hard on creating a solid and universal staging area. All hashkeys and hashdiffs are calculated here and users are given the option to add derived columns, generate prejoins with other stages and add ghost records to their data. All of this highly automated based on parameterized user input. 

Based on staging areas, the Data Vault 2.0 spine can be created. Hubs, Links and Non-Historized Links can be loaded from multiple sources including mapping options to ensure business harmonization. 

This spine is then enriched by Standard Satellites, Non-Historized Satellites, Multi-Active Satellites and/or Record-Tracking Satellites. All of those that require it come with a version 0 for tables and a version 1 for end-dated views. 

Based on the Raw Data Vault, PITs can be created automatically, and their loading is backed by an automated, highly-configurable but optional logarithmic snapshot logic. This logic is included in the Control Snapshot Table, which also comes in two consecutive versions. To wrap the logarithmic snapshot logic up, a post-hook for cleaning up all PITs is included and comes in handy.

DataVault4dbt Powered by Scalefree

Start now and boost your Data Vault experience!

The lines above made you think “Nah, that’s all too good to be true!”? Convince yourself, or give us your highly appreciated feedback by visiting datavault4dbt on Github!

Of course, our future ambitions for datavault4dbt are high and next on our list are a lot of important topics, like:

  • Provide a detailed working example of datavault4dbt
  • Extend and migrate the existing documentation of the package
  • Support more and more databases
  • Add more advanced and specific Data Vault 2.0 entities
  • Develop automated Data Vault related tests
  • Review and implement user feedback and suggestions

Stay tuned for more datavault4dbt content on all our marketing channels!

Data Warehouse Automation – Build or Buy?

Watch the Webinar

In this webinar, we take a sneak peek into one of the hot topics of modern data warehousing, namely Data Warehouse Automation. We would break down the basics of DW automation & how it has brought about a cultural shift in the realm of a modern Data Warehouse & its architecture.

In this regard, we will also touch upon the often-asked question “Build or buy” along with sharing our experience working with several customers who have benefited immensely from automation and the key lessons we have learned as part of our overall DW automation journey.

This webinar is for anyone who loves Data!

Watch Webinar Recording

Webinar Agenda

1. Understanding Data Warehouse Automation
2. Drivers for Decision ING
3. Automation in Data Vault
4. Anti-Patterns in DV Automation
5. Best Practices

Speed Up Your Vault with VaultSpeed – Success Through Automation – Part 2

Watch the Webinar

In this Webinar, we take a closer look at Data Warehouse Automation and how easily it can be implemented. First, we will break down the basics of Data Warehouse Automation.

Then we will show how Data Vault 2.0 can contribute to successful Automation on the basis of a sample COVID-19 data set showing vaccine and infection numbers.

Lastly, Vaultspeed will give a demonstration of their tool that implements our suggested model using the data set thus showing the viewers how “simple and easy” VaultSpeed as a Data Warehouse Automation tool is.

This webinar is for everyone who wants to learn about Data Warehouse Automation and a sneak peek into VaultSpeed.

Watch Webinar Recording

Webinar Agenda

1. Understanding Data Warehouse Automation
2. Automation in Data Vault 2.0
3. Usecase
4. Vaultspeed Demo

Speed Up Your Vault with VaultSpeed – Success Through Automation – Part 1

Watch the Webinar

In this Webinar, we take a closer look at Data Warehouse Automation and how easily it can be implemented. First, we will break down the basics of Data Warehouse Automation.

Then we will show how Data Vault 2.0 can contribute to successful Automation on the basis of a sample COVID-19 data set showing vaccine and infection numbers.

Lastly, Vaultspeed will give a demonstration of their tool that implements our suggested model using the data set thus showing the viewers how “simple and easy” VaultSpeed as a Data Warehouse Automation tool is.

This webinar is for everyone who wants to learn about Data Warehouse Automation and a sneak peek into VaultSpeed.

Watch Webinar Recording

Webinar Agenda

1. Understanding Data Warehouse Automation
2. Automation in Data Vault 2.0
3. Usecase
4. Vaultspeed Demo

Running Modern ETL-Processes with Framework-Based Tools – Part 2

Managed Self Service BI image

In the last blog post, we introduced Singer, the open-source framework, as a powerful tool for ETL processes. This time, we’d like to discuss how you can implement the framework in your own projects.

How to start working with Singer

Starting a test run is rather simple. First, you need to create a python environment,  for which step-by-step instructions to do so are available online. 

As soon as you’ve done that, it’s time to create your first virtual environment inside python.
Please note before beginning that it’s a best practice to create and use an individual virtual environment for every tap and target. This avoids any conflicts between module requirements for the different modules. 

The next step is to install the tap and target you’ve chosen into their corresponding virtual environment. This installation can be performed very easily using a pip install command. This example command installs the tap-salesforce to the load data from your Salesforce account:
Continue Reading

Running Modern ETL-Processes with Framework-Based Tools – Part 1

Data Vault 2.0 Information Delivery Class

A big part of every Enterprise Datawarehouse are ETL- or ELT-processes.
In both abbreviations, the letters stand for the same words, only the order in which each process is done changes.
To brush-up on those processes, “E” stands for extraction, “T” for transformation and “L” is for loading.

That said, rather than dive into the benefits of each,  we would like to present a powerful open-source framework to execute the processes instead.

Why use a framework?

Rather than developing individual solutions per source system, using standardized frameworks provides a wide variety of benefits. The main of which we have already mentioned, standardization.
Another benefit, using the same concept for extracting data from different source systems allows your system to become more auditable and reliable.
And when taking into consideration the varied benefits between frameworks, other potential upsides become available as well. Continue Reading

Document Processing in MongoDB

In continuing our ongoing series, this piece within the blog series will describe the basics of querying and modifying data in MongoDB with a focus on the basics needed for the Data Vault load as well as query patterns. 

In contrast to the tables used by relational databases, MongoDB uses a JSON-based document data model. Thus, documents are a more natural way to represent data as a single structure with related data embedded as sub-documents and arrays collapses what is otherwise separated into parent-child tables linked by foreign keys in a relational database. You can model data in any way that your application demands – from rich, hierarchical documents through to flat, table-like structures, simple key-value pairs, text, geospatial data, and the nodes as well as edges used in graph processing.

Continue Reading

Processing Enterprise Data with Documents in MongoDB

Today’s enterprise organizations receive and process data from a variety of sources, including silos generated by web as well as mobile applications, social media, artificial intelligence solutions in addition to IoT sensors. That said, the efficient processing of this data at high volume in an enterprise setting is still a challenge for many organizations. 

Typical challenges include issues such as the integration of mainframe data with real-time IoT messages and hierarchical documents.
One of such issues being that enterprise data is not clean and might have contradicting characteristics as well as interpretations. This poses a challenge for many processes such as when integrating customers from multiple source systems.

Though, data cleansing could be considered as a solution to this problem. However, what if different data cleansing rules should be applied to the incoming data set? For example, because the basic assumption for “a single version of the truth” doesn’t exist in most enterprises. While one department might have a clear understanding of how the incoming data should be cleansed, another department, or an external party, might have another understanding. 

Continue Reading

Close Menu