Skip to main content
search
0
Scalefree Blog Data Tools Get Started with dbt: A Quick 15-Minute Guide Using Snowflake

Introduction to dbt

dbt (data build tool) revolutionizes the way teams build and maintain analytics workflows by bringing software‐engineering best practices to SQL‐based data transformations. Instead of ad‐hoc scripts, dbt encourages version control, modular models, testing, documentation, and lineage graphs. In this guide, you’ll learn how to go from zero to your first dbt models—running entirely in the cloud on Snowflake—in under 15 minutes.



Why Choose dbt Cloud + Snowflake?

  • Fully managed: No local install or complex orchestration; dbt Cloud handles hosting.
  • Quick setup: Snowflake’s partner connector spins up a dbt trial, pre-configured with your credentials.
  • Best practices out of the box: Built-in IDE, job scheduling, Git integration, and documentation.
  • Scalable performance: Leverages Snowflake’s compute power for fast model builds.

Prerequisites

  1. A Snowflake trial account (free, 1-minute setup).
  2. A modern browser (Chrome, Firefox) or VS Code for remote development.
  3. Basic familiarity with SQL.

1. Launching dbt Cloud from Snowflake

Once logged into your Snowflake trial, navigate to the Data Products → Partners → Connect pane. Scroll to find the dbt entry and click Connect → Launch. This will automatically:

  • Provision a dbt Cloud trial account
  • Create a new Snowflake database and warehouse
  • Inject Snowflake credentials into your dbt Cloud connection

You’ll land in the dbt Cloud dashboard, ready to start your first project.

2. Exploring the dbt Cloud UI

In dbt Cloud’s left navigation bar you’ll find:

  • Develop: Interactive IDE for coding models, sources, tests, documentation.
  • Deploy: Job definitions, environments, and run history.
  • Documentation: Auto‐generated docs site with lineage graphs.
  • Settings: Account, project, and Git integration.

Click Develop → IDE (hosted in Chrome or connect your VS Code). Let’s initialize our dbt project.

3. Initializing Your dbt Project

  1. In the IDE, open the Version Control pane and click Initialize dbt Project.
  2. Accept the defaults; dbt creates a dbt_project.yml and folder structure (models/, macros/, etc.).
  3. Commit the auto‐generated files in a new Git branch: “initialized dbt project”.

Your Git pane now shows uncommitted files; click Commit & Push to save the project baseline.

4. Defining Your Source Data

dbt doesn’t load data from external systems—you must point it to existing tables. Snowflake’s sample database (SNOWFLAKE_SAMPLE_DATA) contains TPC-H tables you can use.

Create a new YAML file under models/ named sources.yml with:

version: 2

sources:
  - name: tpch
    database: SNOWFLAKE_SAMPLE_DATA
    schema: TPCH_SF1
    tables:
      - name: CUSTOMER
      - name: ORDERS

Save to see the lineage graph update with two new source nodes.

5. Building a Staging Model

Staging models standardize raw tables and prepare them for downstream transformations. In models/, delete the example/ folder and instead:

  1. Create a folder called models/staging/tpch.
  2. In that folder, create stg_tpch_customer.sql:
{{ config(materialized='view') }}

select
  C_CUSTKEY   as customer_key,
  C_NAME      as customer_name,
  C_ACCTBAL   as account_balance,
  C_COMMENT   as comment
from {{ source('tpch', 'CUSTOMER') }}

Notes:

  • The source() macro resolves to the fully qualified table.
  • materialized='view' tells dbt to build a view by default.

Save and click Run → dbt run to build just this model. In seconds you’ll see a view in your Snowflake UI under the dev schema.

6. Creating a Production-Ready Dimension

Dimensions (Gold layer) contain curated, business-ready tables. Let’s filter for customers with positive balances.

  1. Create models/marts/customer_dim.sql:
{{ config(materialized='table') }}

select
  customer_key,
  customer_name,
  account_balance
from {{ ref('stg_tpch_customer') }}
where account_balance > 0

Here, ref() links to another model, ensuring correct build order and clear lineage.

Click Compile to preview generated SQL, then Run → dbt run to create the table in Snowflake.

7. Testing and Documentation

dbt encourages tests to enforce data quality:

  • Add to models/staging/tpch/schema.yml:
version: 2

models:
  - name: stg_tpch_customer
    tests:
      - not_null:
          column_name: customer_key
      - unique:
          column_name: customer_key

Run dbt test to validate your models. Any failures will be reported in the UI.

Generate documentation with dbt docs generate, then preview via dbt docs serve. Explore your project’s lineage graph and column descriptions.

8. Version Control & Deployment

dbt Cloud integrates Git for collaboration. After feature development:

  1. Commit your branch and open a pull request (GitHub, GitLab, Bitbucket).
  2. Merge into main.

In Deploy → Environments, create a production environment. Under Jobs, define a job that runs:

  • dbt seed (if you have local CSV seeds)
  • dbt run
  • dbt test

Schedule the job (e.g., hourly, daily) or trigger it on Git commits. Monitor run history and logs directly in dbt Cloud.

9. Best Practices & Next Steps

  • Modularize models: Break complex logic into smaller models.
  • Document extensively: Use YAML descriptions for sources, models, and columns.
  • Implement CI/CD: Integrate dbt Cloud jobs with your team’s CI pipeline.
  • Leverage analyses: Create analyses/ for ad-hoc queries and charts.
  • Scale with packages: Reuse community packages (e.g., dbt_utils).

For deeper dives, explore the official dbt documentation and the dbt Hub for community packages and best practices.

Watch the Video

Conclusion

In just a few steps, you’ve:

  • Provisioned dbt Cloud via Snowflake
  • Initialized a dbt project with Git
  • Defined raw sources and built staging models
  • Created a production dimension with ref()
  • Tested data quality and generated documentation
  • Set up a CI/CD job for automated deployment

Now you have a repeatable, maintainable analytics pipeline. Keep building new models, add tests, and document as you go—your future self (and team!) will thank you.

The Data Vault Handbook

Build your path to a scalable and resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read it for Free

Leave a Reply

Close Menu