Get Started with Google BigQuery and dbt Cloud
If you are looking to build a modern, scalable data transformation workflow, combining dbt Cloud with Google BigQuery is one of the most powerful stacks available today. In this guide — based on a hands-on walkthrough video — we cover everything from setting up your Google Cloud project to running your first automated deployment job. Whether you are new to analytics engineering or already familiar with SQL-based transformations, this tutorial will get you productive quickly.
In this article:
- What Is dbt and Why Does It Work So Well with Google BigQuery?
- Step 1: Set Up Your Google Cloud Project and BigQuery Dataset
- Step 2: Create a Service Account and Generate a JSON Key
- Step 3: Connect dbt Cloud to BigQuery
- Step 4: Initialize Your dbt Project
- Step 5: Define Your Sources
- Step 6: Build Staging Models
- Step 7: Build a Customer Dimension Model
- Step 8: Create a Deployment Environment and Schedule a Job
- What to Learn Next: Data Vault, dbt, and Scalable Analytics Engineering
- Watch the Video
What Is dbt and Why Does It Work So Well with Google BigQuery?
Before diving into the setup steps, it helps to understand what each tool actually does and why they complement each other so naturally.
dbt (data build tool) is an open-source transformation framework that brings software engineering best practices into data teams. With dbt, your SQL code is version-controlled, tested, documented, and executed through scheduled jobs — the same discipline that software engineers apply to application code. You write modular SQL models, dbt compiles them, and the actual query execution happens entirely on your data platform.
Google BigQuery is a fully managed, serverless cloud data warehouse built for large-scale analytics. It handles the storage and compute, scales automatically, and integrates tightly with the broader Google Cloud ecosystem. Critically for this tutorial, BigQuery hosts publicly available datasets you can query immediately — no manual data loading required to get started.
Together, dbt and BigQuery form a clean separation of concerns: dbt manages your transformation logic, documentation, and pipeline orchestration, while BigQuery handles the heavy lifting of storing and querying your data at scale. This makes the combination especially attractive for analytics engineering teams that want speed, reliability, and maintainability.
Step 1: Set Up Your Google Cloud Project and BigQuery Dataset
Everything starts in the Google Cloud Console. If you do not already have a Google Cloud account, you can start for free — Google offers around $300 in credits with no credit card required upfront, which is more than enough for this quickstart.
Once you are in the console, navigate to the Cloud Resource Manager and create a new project. For this demo, a name like bigquery-dbt-quickstart works well. After the project is created, head to the BigQuery console from the main navigation menu.
In BigQuery, you will immediately notice that the terminology differs slightly from traditional databases. Instead of database → schema → table, BigQuery uses project → dataset → table. The concepts map directly, but knowing this terminology prevents confusion when you are configuring dbt later.
Create a new dataset inside your project — call it something like dbt_demo. During creation, you will be asked to set a data location (choose a region that makes sense for your use case, or a multi-region option like US or EU), an optional expiration policy, and an encryption setting. The defaults are fine for a quickstart.
The public tutorial data you will be using lives in a project called dbt-tutorial, under a dataset named jaffle_shop. It contains customer and order tables — exactly what you need to build a meaningful first model.
Step 2: Create a Service Account and Generate a JSON Key
For dbt Cloud to run queries in BigQuery on your behalf, it needs authenticated access. The most straightforward approach for a quickstart is a Google Cloud service account with a JSON key file.
In the Google Cloud console, use the Credentials Wizard under APIs & Services to create a new service account. Name it something like dbt-user. Assign it two roles:
- BigQuery Data Editor — allows the service account to create and modify tables
- BigQuery User — allows it to run jobs against the project
Once the service account is created, open it and navigate to the Keys tab. Create a new JSON key. This downloads a file to your local machine — keep it secure and do not commit it to version control. You will upload it to dbt Cloud in the next step.
Note: In a production environment, you would typically use OAuth or Workload Identity Federation rather than a JSON key file. For this quickstart, the service account key is the simplest and most transparent option.
Step 3: Connect dbt Cloud to BigQuery
With your service account key ready, log in to dbt Cloud and create a new project. When prompted to set up a connection, choose BigQuery and upload the JSON key file you just downloaded. dbt Cloud will parse the file and automatically populate your project and credential details.
Set a target name for the connection — something like dbt-demo — and save. For the repository, this demo uses dbt’s managed Git repository, which is the quickest option. In a real production project, you would connect dbt Cloud to your own GitHub, GitLab, or Azure DevOps repository and use pull requests and CI/CD pipelines to manage code changes properly.
Step 4: Initialize Your dbt Project
Open the dbt Cloud Studio (the browser-based IDE). You will see a button in the top-left to initialize your dbt project. Click it — dbt will automatically generate the standard project scaffold, including the dbt_project.yml file and a default folder structure.
The dbt_project.yml file is the configuration file that tells dbt this folder is a dbt project. It is where you set the project name, define global defaults, and configure materialization strategies per folder. Every SQL model you place inside the models/ folder from this point forward can be compiled, tested, documented, version-controlled, and executed against BigQuery automatically.
You will also notice that the initializer creates some example models. Delete the examples/ subfolder inside models/ to start with a clean structure — but be careful not to delete the models/ folder itself.
Once your project structure looks clean, commit the initial files with a message like project initialization and you are ready to start building.
Step 5: Define Your Sources
In dbt, sources are the starting point of every project. They tell dbt where your raw input data lives — which project, dataset, and table — without hardcoding those references directly into each model file. This makes your project far more maintainable: if a source location changes, you update it in one place and every model that uses it automatically picks up the change.
Create a new file in your models/ folder and name it sources.yml. The structure always starts with version: 2 (required for the current dbt YAML schema), followed by a sources: key. Under that, define a source with a name (for example, jaffle_shop), specify the BigQuery project as the database (dbt-tutorial), the dataset (jaffle_shop), and list the individual tables you want to reference — in this case, customers and orders.
After saving the file, the dbt Cloud lineage panel will update to show your newly recognized sources. They will appear as the entry points in your project’s data lineage graph.
Step 6: Build Staging Models
A core best practice in dbt is to never reference raw source tables directly in your business logic models. Instead, you create a thin staging layer — one model per source table — that lightly cleans and standardizes the data. Typical staging transformations include renaming columns to consistent naming conventions, casting data types, and filtering out clearly invalid records.
dbt Cloud makes this step even easier: the lineage view shows a link next to each detected source table that lets you auto-generate a staging model scaffold. Click it, and dbt generates a boilerplate SQL file using Common Table Expressions (CTEs) and the source() macro.
The source() macro is one of the first things that makes dbt feel different from plain SQL. Instead of writing SELECT * FROM dbt-tutorial.jaffle_shop.customers, you write SELECT * FROM {{ source('jaffle_shop', 'customers') }}. dbt compiles this at runtime into the correct fully qualified BigQuery reference, keeping your code clean and location-independent.
In the staging model for customers, rename the id column to customer_id for clarity. Save the file and use the Preview button to verify that dbt can query the data from BigQuery — this confirms your connection and credentials are working correctly.
When you are ready to persist the model, click Build. dbt will create the model in BigQuery under your personal development dataset — a sandboxed schema tied to your dbt Cloud user account. This means you can build and iterate on models freely without touching anything in your production environment.
If you switch back to the BigQuery console at this point, you will find a new dataset named after your dbt username, and inside it, a view representing your staging model. Repeat the same process for the orders source table.
Step 7: Build a Customer Dimension Model
With your staging models in place, you are ready to build a more meaningful transformation: a customer dimension that joins customer data with order history.
Create a new folder inside models/ called marts/ (a common convention for final output models), and inside it create a file called dim_customers.sql.
In this model, instead of the source() macro, you will use the ref() function — the other core dbt macro. Where source() points to raw input tables, ref() points to other dbt models you have already defined. Writing {{ ref('stg_jaffle_shop__customers') }} does two things: it generates the correct BigQuery reference at compile time, and it tells dbt that this model depends on the staging model — building the dependency graph that dbt uses to determine build order, enable lineage tracking, and support documentation.
Build out the model with a few CTEs: one that selects from your customers staging model, one that aggregates order data per customer (first order date, most recent order date, and total order count), and a final CTE that joins the two together. The lineage panel will update to show the full data flow from raw sources through staging into the dimension model — a powerful visualization, especially as your project grows.
Build the model, verify it in BigQuery, then commit your changes and merge to main.
Step 8: Create a Deployment Environment and Schedule a Job
Development work happens in personal dev environments. Production data transformations require a deployment job — a configured, scheduled run that executes your models against your production BigQuery dataset.
In dbt Cloud, navigate to Orchestration → Environments and create a production environment. You will need at minimum one development environment and one production environment. Attach a connection profile to the production environment that points to the dbt_demo dataset you created earlier — this is where production models will be materialized.
Then go to Orchestration → Jobs → Deploy Job and create a new job. Name it something like Production Daily, select your production environment, and configure the run command as dbt build (which compiles, runs, and tests all models in one step). Enable a schedule — for example, daily at 2:00 AM using either the UI scheduler or a cron expression for more precise control.
You can also trigger the job manually to verify it runs successfully. The job run view shows the status of each model — pass, fail, or skip — making it easy to spot and debug issues in production.
What to Learn Next: Data Vault, dbt, and Scalable Analytics Engineering
What you have built in this walkthrough — connecting dbt Cloud to Google BigQuery, defining sources, creating staging and dimension models, and scheduling a deployment job — represents the foundational workflow you will use in every real analytics engineering project.
But this is just the beginning. dbt supports a wide range of advanced features: custom tests, documentation generation, macros and Jinja templating, incremental models for large datasets, snapshots for slowly changing dimensions, and much more.
If you want to go deeper and learn how to combine dbt with structured modeling methodologies, Data Vault 2.0 is the industry standard for building scalable, auditable, and future-proof enterprise data warehouses. At Scalefree, we offer comprehensive Data Vault 2.1 training and certification programs that cover everything from foundational concepts to advanced implementation patterns — including how to integrate dbt into a Data Vault workflow on platforms like BigQuery.
Whether you are just starting your analytics engineering journey or looking to formalize your team’s approach to data modeling, our Data Vault 2.1 certification is the most direct path to building production-grade data platforms with confidence.
Check out our other platform-specific tutorials — including guides for Snowflake — and explore our webinars and training programs to take your data engineering skills to the next level.

