Skip to main content
search
0
Scalefree Knowledge Webinars Expert Sessions dbt Talk dbt Source Freshness: Ensuring Reliable and Timely Data in Your Pipeline

dbt Source Freshness

Data teams rely on timely, accurate, and complete data to support dashboards, KPIs, reporting, and data-driven decision making. But even the most advanced data models and transformation logic cannot fix one critical issue: stale or outdated upstream data. This is where dbt Source Freshness becomes one of the most valuable quality checks in your analytics engineering toolkit.

In this article, we take a close look at what dbt Source Freshness is, why it matters, how it works under the hood, and how you can configure and run freshness checks both locally and in dbt Cloud. If your organization depends on reliable data pipelines—or if you’ve ever discovered too late that a report was built on old data—this guide will help you avoid those costly surprises.



What Is dbt Source Freshness?

Source freshness in dbt is a built-in mechanism that measures how up-to-date data is in your defined source tables. While data transformations can apply logic, aggregations, and business rules, they inherently depend on data arriving on time. If source data is delayed, incomplete, or entirely outdated, every model downstream will reflect that delay.

dbt Source Freshness provides a simple, reliable indicator of whether the data you are working with is fresh enough to support your operational and analytical processes. It helps you answer one crucial question:

“Is the data I’m transforming actually the latest data available?”

When enabling freshness checks, dbt evaluates the most recent timestamp from a specified column in your source table and determines whether that timestamp violates your defined freshness thresholds. These thresholds act as data SLAs for your pipeline.

Why Source Freshness Matters

The importance of monitoring source data freshness cannot be overstated. When upstream data is stale, the consequences cascade throughout your entire analytics ecosystem. Dashboards may show outdated KPIs. Operational teams may make decisions based on incomplete numbers. Forecasts and reports may misrepresent the true state of the business.

One scenario that many data teams have encountered illustrates the problem perfectly: a business report runs on what everyone assumes is the latest data. After a few weeks, the team discovers that the upstream system had stopped updating its tables entirely. What appeared to be fresh data was actually months old. As a result, the report generated incorrect metrics for an extended period.

With source freshness monitoring in place, delays like these can be caught immediately. dbt highlights them clearly, allowing teams to:

  • Detect upstream system failures.
  • Identify delays in ingestion or replication pipelines.
  • Enforce data delivery SLAs with source system owners.
  • Stop inaccurate transformations from running on stale data.

Freshness checks turn what could be a hidden issue into a transparent, actionable signal.

How dbt Source Freshness Works

Source freshness configuration lives directly inside the YAML file where your source is defined. This design decision is intentional—freshness belongs to the source, not to downstream models. Each source or table can have its own customized freshness rules.

A typical source block with freshness configuration looks like this:

sources:
  - name: my_source
    tables:
      - name: orders
        freshness:
          warn_after: {hours: 24}
          error_after: {hours: 48}
        loaded_at_field: updated_at

Let’s break down the key components.

loaded_at_field

This is the timestamp column dbt uses to determine when the most recent record arrived. dbt queries this field, finds the newest timestamp, and calculates its age relative to the current time.

Important: dbt always evaluates freshness in UTC time. If your source system stores local timestamps (e.g., CET, EST), the value in loaded_at_field must be converted to UTC.

Thresholds: warn_after and error_after

Freshness thresholds define what “fresh enough” means. dbt compares the age of the newest record with these time limits and returns one of three statuses:

  • pass – the data is within the acceptable freshness window.
  • warn – the data is late but not critically late.
  • error – the data is beyond the maximum acceptable age.

These thresholds effectively act as SLAs, helping teams formalize expectations about data arrival. For example:

  • Warn after 24 hours.
  • Error after 48 hours.

If the source table hasn’t received new records in over 48 hours, dbt marks the freshness check as an error, signaling that the table is unreliable until updated.

What Happens During a Freshness Check?

When you run a freshness check, dbt performs a straightforward but effective procedure:

  1. dbt queries the loaded_at_field and finds the most recent timestamp.
  2. It calculates the time difference between that timestamp and the current UTC time.
  3. It compares the age of the data to your defined thresholds.
  4. It returns a pass, warn, or error result.

This process is intentionally lightweight and fast. It avoids unnecessary complexity while giving teams a dependable, high-value signal about upstream data timeliness.

How to Run Freshness Checks in dbt

Running a freshness check in dbt is simple. The main command is:

dbt source freshness

This command evaluates freshness for all sources that have freshness configurations defined. You can also target a specific source or table:

dbt source freshness --select source:my_source
dbt source freshness --select source:my_source.orders

When executed, dbt displays the freshness status for each table along with metadata such as:

  • The latest timestamp found.
  • The calculated age of the data.
  • The threshold values used.

Running Freshness Checks in dbt Cloud

dbt Cloud makes managing freshness checks even easier. You can create a dedicated job that runs only freshness checks, or you can add freshness as a step in a larger job. This enables automatic monitoring without requiring manual execution.

Once the job completes, results appear directly in the dbt Cloud UI. For each table, you can see:

  • The age of the most recent record.
  • Whether the table passed, warned, or errored.
  • When the freshness check was last executed.

You can also inspect the detailed logs to understand exactly how dbt evaluated each source.

Why Freshness Checks Should Be a Standard Practice

In modern analytics engineering, data reliability is just as important as transformation logic. Freshness checks are a lightweight yet powerful way to ensure that your source systems are delivering data on time.

Without freshness checks, data issues may go unnoticed until they have already impacted dashboards, stakeholder decisions, or downstream processes. With freshness monitoring enabled, you gain visibility into problems early, allowing your team to respond quickly and prevent incorrect reporting.

As data ecosystems grow more complex—with multiple ingestion pipelines, third-party APIs, and event-based systems—freshness checks provide a simple, standardized way to maintain trust in your data.

Watch the Video

Meet the Speaker

Dmytro Polishchuk profile picture

Dmytro Polishchuk
Senior BI Consultant

Dmytro Polishchuk has 7 years of experience in business intelligence and works as a Senior BI Consultant for Scalefree. Dmytro is a proven Data Vault 2.0 expert and has excellent knowledge of various (cloud) architectures, data modeling, and the implementation of automation frameworks. Dmytro excels in team integration and structured project work. Dmytro has a bachelor’s degree in Finance and Financial Management.

Leave a Reply

Close Menu