Data is an important asset in the decision making process. As we have previously discussed in another post, Data Vault 2.0 is the right choice when the goal of an enterprise data warehouse is to have fully historized and integrated data. Additionally, it is also better suited to instances in which data from many source systems needs to be combined. You can find the previous blog post here.
While Data Vault 2.0 focuses on the “what”, there are many options for the “how” of technically translating a Data Vault model into physical tables and views in the enterprise data warehouse, as well as for orchestrating and loading/processing the procedures. And this is where dbt comes in.
The data build tool (dbt) transforms your data directly in your data warehouse. For reference, dbt is the “T” in ELT. Therefore, dbt assumes that data is already loaded into a database from which the current database can query from. In contrast, ETL extracts data then transforms it and before loading it into the target. With ELT, the data is not transformed as it moves into the data warehouse. (with transform we mean transformation by soft business rules which would change the meaning of the data. Of course we have to make sure that the data fits into the target table (data type adjustments etc.). Here we are talking about “hard rules”.)
Dbt is particularly compatible and useful in cloud DWH solutions such as Snowflake, Azure Synapse Analytics, BigQuery and Redshift and performs transformations and modeling directly on the database in order to take advantage of the performance of these highly scalable platforms.
HOW DBT WORKS
Models and SQL statements can be easily created, tested and managed in dbt itself. A powerful combination of the scripting language, Jinja2, and the all-time classic SQL allows users to build models. The simple interface enables data analysts without engineering know-how to initiate appropriate transformations. Data team workflows become more efficient and cost-effective because of this. Behind dbt sits an open source community that is constantly and passionately developing the tool. As such, dbt is available both as a free, reduced core version and as a comprehensive and flexible cloud version.
What is the place of dbt in Data Vault 2.0?
Part of the Data Vault 2.0 methodology is the model which focuses on how to design the core data warehouse with a scalable solution. The core Data Vault entities are hubs, links and satellites. That said, dbt provides the ability to generate Data Vault models and also allows you to write your data transformations using SQL and code-reusable macros powered by Jinja2 to run your data pipelines in a clean and efficient way.
The first step in setting up a data warehouse project in an agile way is defining the objective of the project with just one or two pages. Unlike waterfall projects, the goal is to produce working pieces of usable outputs, could be reports or dashboards, in continuous iterations, otherwise called sprints. This means that we don’t need to plan the entire project in detail but instead can build around a general idea or goal for the final data warehouse before then focusing on planning the first sprints. In order to address the aforementioned problems, the focus of the sprints needs to be centered around business value. For this reason, it is important to receive constant feedback from the business users for a continuous improvement process.
– by Ole Bause (Scalefree)
Get Updates and Support
Please send inquiries and feature requests to [email protected].
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.