Testing Data Pipelines
In today’s data-driven world, the reliability of your analytics depends on the integrity of your data pipelines. Even the most sophisticated transformations can be undermined by bad source data, schema changes, or simple human error. That’s where dbt (data build tool) shines: it provides a framework not only for transforming and modeling your data, but also for validating it at every step. In this article, we’ll explore five essential testing strategies you can implement with dbt to catch issues early, enforce data contracts, and build confidence in your analytics.
In this article:
1. Custom SQL Tests
What they are: Custom SQL tests allow you to write bespoke SQL queries against your models to enforce complex business rules or edge-case validations. They live alongside your models and execute as part of your dbt test suite.
Why you need them: Out-of-the-box tests cover many common cases, but sometimes you have unique conditions—like “no customer may have more than one active subscription”—that require a tailored query.
-- Example: Ensure no customer has more than one active subscription
SELECT
customer_id,
COUNT(*) AS active_sub_count
FROM {{ ref('subscriptions') }}
WHERE status = 'active'
GROUP BY customer_id
HAVING COUNT(*) > 1;
If this query returns any rows, the test will fail, alerting you to a data integrity issue before downstream models consume bad data.
2. Personalized Macros
What they are: Macros in dbt are reusable snippets of SQL or Jinja logic. By building custom macros for your testing patterns, you can enforce consistent checks without repeating verbose SQL.
Why you need them: Many projects share recurring validation needs—like ensuring date fields are never in the future or that monetary amounts are non-negative. A macro lets you encapsulate that logic once and apply it across multiple models.
{% macro expect_dates_in_past(model, column) -%}
SELECT
*
FROM {{ ref(model) }}
WHERE {{ column }} > current_date
{%- endmacro %}
-- Usage in schema.yml
tests:
- expect_dates_in_past:
args:
model: 'orders'
column: 'order_date'
With just a few lines in your YAML, you’ve enabled a robust, maintainable check across your entire project.
3. Built-In dbt Data Tests
What they are: dbt ships with four standard data tests: unique
, not_null
, accepted_values
, and relationships
. These cover the most frequent requirements for uniqueness, presence, domain constraints, and referential integrity.
Why you need them: They require zero custom SQL. You simply declare your expectations in your model’s .yml
file, and dbt will generate and execute the underlying queries.
models:
- name: users
columns:
- name: user_id
tests:
- unique
- not_null
- name: country_code
tests:
- accepted_values:
values: ['US', 'CA', 'MX']
Within minutes, you’ve added core validation checks to your critical tables, ensuring nulls or duplicates never slip through.
4. Unit Tests
What they are: Unit tests isolate a model’s logic by feeding it controlled input data and verifying that its output matches expected results. In dbt, you can implement unit tests using seed files or temporary models.
Why you need them: When your transformations involve complex calculations—like computing a customer’s lifetime value—you want airtight proof that the logic works under all conditions, including edge cases.
-- seed file: tests/lv_input.csv
customer_id,order_amount
1,100
1,200
2,50
2,75
-- model: tests/lv_test.sql
with input_data as (
select * from {{ ref('lv_input') }}
),
calculated as (
select
customer_id,
sum(order_amount) as lifetime_value
from input_data
group by customer_id
)
select *
from calculated
where
(customer_id = 1 and lifetime_value = 300)
or (customer_id = 2 and lifetime_value = 125);
If the calculated
results don’t match the hardcoded expectations, the test fails, pinpointing exactly where your logic diverged.
5. dbt Model Contracts
What they are: Model contracts let you define strict schema expectations—column types, required fields, accepted value sets—directly in your schema.yml
files. They act as formal agreements between your data producers and consumers.
Why you need them: Without explicit contracts, a subtle change in upstream data can silently break your analytics. Contracts ensure that any structural or type deviations immediately surface as test failures.
models:
- name: orders
columns:
- name: order_id
data_type: integer
tests:
- not_null
- name: order_amount
data_type: decimal
tests:
- not_null
With contracts in place, if someone accidentally returns order_amount
as text or drops a required field, dbt stops execution and flags the violation, protecting downstream dashboards and reports.
Putting It All Together: A Comprehensive Testing Strategy
Testing in dbt is more than an afterthought—it’s a core pillar of a robust analytics engineering workflow. By combining custom SQL tests, personalized macros, built-in data tests, unit tests, and model contracts, you establish multiple layers of defense against data quality issues. Here’s a sample workflow:
- YAML Configuration: Start by declaring built-in tests and contracts in your
schema.yml
for each model. - Custom Tests: Add bespoke SQL tests in
tests/
for any project-specific rules. - Macros: Create a
macros/
folder with reusable test macros for common patterns. - Unit Tests: Define seed files and test models under
tests/unit/
to validate critical transformations. - CI/CD Integration: Hook
dbt test
into your CI pipeline so that every pull request runs the full suite, ensuring no faulty code or data reaches production.
This layered approach not only catches errors early but also documents your data expectations for new team members and stakeholders.
Conclusion
Ensuring the accuracy, reliability, and trustworthiness of your data requires more than ad-hoc checks—it demands a structured, repeatable testing framework. dbt provides the tools you need, from built-in quick-start tests to fully customized SQL validations and contracts. By implementing these five testing methods, you’ll be well on your way to bulletproofing your data pipelines and empowering stakeholders with confidence in their analytics.
Ready to level up your data quality? Start by adding one new test to your next dbt model, and watch your data reliability soar.
Watch the Video
Meet the Speaker

Hernan Revale
Senior Advisor and Head of Research
Hernan Revale has been working in Business Intelligence supporting Scalefree International since 2022 as a BI Consultant. Prior to Scalefree, he had over three years of experience as an independent consultant in the areas of business intelligence, strategic planning, and analytics. He was the General Manager of the Research and Technology Transfer area of a National University in Argentina. Hernan has an MSc with Distinction in Business Analytics from Imperial College London and is a Certified Data Vault 2.0 Practitioner. He is also a university professor and researcher, with multiple presentations at conferences and in indexed journals.