Introduction to dbt
dbt (data build tool) revolutionizes the way teams build and maintain analytics workflows by bringing software‐engineering best practices to SQL‐based data transformations. Instead of ad‐hoc scripts, dbt encourages version control, modular models, testing, documentation, and lineage graphs. In this guide, you’ll learn how to go from zero to your first dbt models—running entirely in the cloud on Snowflake—in under 15 minutes.
In this article:
- Why Choose dbt Cloud + Snowflake?
- Prerequisites
- 1. Launching dbt Cloud from Snowflake
- 2. Exploring the dbt Cloud UI
- 3. Initializing Your dbt Project
- 4. Defining Your Source Data
- 5. Building a Staging Model
- 6. Creating a Production-Ready Dimension
- 7. Testing and Documentation
- 8. Version Control & Deployment
- 9. Best Practices & Next Steps
- Watch the Video
- Conclusion
Why Choose dbt Cloud + Snowflake?
- Fully managed: No local install or complex orchestration; dbt Cloud handles hosting.
- Quick setup: Snowflake’s partner connector spins up a dbt trial, pre-configured with your credentials.
- Best practices out of the box: Built-in IDE, job scheduling, Git integration, and documentation.
- Scalable performance: Leverages Snowflake’s compute power for fast model builds.
Prerequisites
- A Snowflake trial account (free, 1-minute setup).
- A modern browser (Chrome, Firefox) or VS Code for remote development.
- Basic familiarity with SQL.
1. Launching dbt Cloud from Snowflake
Once logged into your Snowflake trial, navigate to the Data Products → Partners → Connect pane. Scroll to find the dbt entry and click Connect → Launch. This will automatically:
- Provision a dbt Cloud trial account
- Create a new Snowflake database and warehouse
- Inject Snowflake credentials into your dbt Cloud connection
You’ll land in the dbt Cloud dashboard, ready to start your first project.
2. Exploring the dbt Cloud UI
In dbt Cloud’s left navigation bar you’ll find:
- Develop: Interactive IDE for coding models, sources, tests, documentation.
- Deploy: Job definitions, environments, and run history.
- Documentation: Auto‐generated docs site with lineage graphs.
- Settings: Account, project, and Git integration.
Click Develop → IDE (hosted in Chrome or connect your VS Code). Let’s initialize our dbt project.
3. Initializing Your dbt Project
- In the IDE, open the Version Control pane and click Initialize dbt Project.
- Accept the defaults; dbt creates a
dbt_project.yml
and folder structure (models/
,macros/
, etc.). - Commit the auto‐generated files in a new Git branch: “initialized dbt project”.
Your Git pane now shows uncommitted files; click Commit & Push to save the project baseline.
4. Defining Your Source Data
dbt doesn’t load data from external systems—you must point it to existing tables. Snowflake’s sample database (SNOWFLAKE_SAMPLE_DATA
) contains TPC-H tables you can use.
Create a new YAML file under models/
named sources.yml
with:
version: 2
sources:
- name: tpch
database: SNOWFLAKE_SAMPLE_DATA
schema: TPCH_SF1
tables:
- name: CUSTOMER
- name: ORDERS
Save to see the lineage graph update with two new source nodes.
5. Building a Staging Model
Staging models standardize raw tables and prepare them for downstream transformations. In models/
, delete the example/
folder and instead:
- Create a folder called
models/staging/tpch
. - In that folder, create
stg_tpch_customer.sql
:
{{ config(materialized='view') }}
select
C_CUSTKEY as customer_key,
C_NAME as customer_name,
C_ACCTBAL as account_balance,
C_COMMENT as comment
from {{ source('tpch', 'CUSTOMER') }}
Notes:
- The
source()
macro resolves to the fully qualified table. materialized='view'
tells dbt to build a view by default.
Save and click Run → dbt run to build just this model. In seconds you’ll see a view in your Snowflake UI under the dev
schema.
6. Creating a Production-Ready Dimension
Dimensions (Gold layer) contain curated, business-ready tables. Let’s filter for customers with positive balances.
- Create
models/marts/customer_dim.sql
:
{{ config(materialized='table') }}
select
customer_key,
customer_name,
account_balance
from {{ ref('stg_tpch_customer') }}
where account_balance > 0
Here, ref()
links to another model, ensuring correct build order and clear lineage.
Click Compile to preview generated SQL, then Run → dbt run to create the table in Snowflake.
7. Testing and Documentation
dbt encourages tests to enforce data quality:
- Add to
models/staging/tpch/schema.yml
:
version: 2
models:
- name: stg_tpch_customer
tests:
- not_null:
column_name: customer_key
- unique:
column_name: customer_key
Run dbt test to validate your models. Any failures will be reported in the UI.
Generate documentation with dbt docs generate, then preview via dbt docs serve. Explore your project’s lineage graph and column descriptions.
8. Version Control & Deployment
dbt Cloud integrates Git for collaboration. After feature development:
- Commit your branch and open a pull request (GitHub, GitLab, Bitbucket).
- Merge into
main
.
In Deploy → Environments, create a production environment. Under Jobs, define a job that runs:
dbt seed
(if you have local CSV seeds)dbt run
dbt test
Schedule the job (e.g., hourly, daily) or trigger it on Git commits. Monitor run history and logs directly in dbt Cloud.
9. Best Practices & Next Steps
- Modularize models: Break complex logic into smaller models.
- Document extensively: Use YAML descriptions for sources, models, and columns.
- Implement CI/CD: Integrate dbt Cloud jobs with your team’s CI pipeline.
- Leverage analyses: Create
analyses/
for ad-hoc queries and charts. - Scale with packages: Reuse community packages (e.g., dbt_utils).
For deeper dives, explore the official dbt documentation and the dbt Hub for community packages and best practices.
Watch the Video
Conclusion
In just a few steps, you’ve:
- Provisioned dbt Cloud via Snowflake
- Initialized a dbt project with Git
- Defined raw sources and built staging models
- Created a production dimension with
ref()
- Tested data quality and generated documentation
- Set up a CI/CD job for automated deployment
Now you have a repeatable, maintainable analytics pipeline. Keep building new models, add tests, and document as you go—your future self (and team!) will thank you.