Skip to main content
search
0
Scalefree Knowledge Webinars Expert Sessions dbt Talk How to Monitor Data Reliability with dbt Source Freshness in dbt Cloud

How to Validate Data Freshness in dbt Cloud

Ensuring that your data is fresh, reliable, and aligned with your SLAs is one of the most important responsibilities of any analytics engineering or BI team. In modern data stacks, dbt Source Freshness plays a key role in validating that upstream systems are loading data as expected. When used properly, it helps teams identify delays, pipeline failures, or missing updates before they impact models and downstream reporting.

This article walks through a full demo of how to configure, run, and monitor Source Freshness checks in dbt Cloud. It builds on the fundamentals introduced in the first video of our series, where we explained what source freshness is, why it matters, and how dbt evaluates freshness. If you haven’t watched that introduction yet, we recommend doing so first.

In this second part, we go hands-on: reviewing source configurations, running freshness checks using both fields and custom SQL queries, applying optional filters, triggering warnings and failures, and inspecting the results in the dbt Cloud UI and Catalog. By the end, you’ll have a clear understanding of how to integrate freshness checks into your workflows and jobs to maintain a highly trustworthy data foundation.



Understanding the Source Freshness Configuration

The demo starts inside a dbt Cloud project with a YAML file containing our source definitions. For this walkthrough, we are working with a source called dbt_talk_demo_sources, which includes two tables:

  • customer_source
  • employee_source

Inside the configuration block, we define the core freshness thresholds:

  • warning_after: 30 minutes
  • error_after: 60 minutes

These settings tell dbt when to flag a source as slightly stale (warning) or critically outdated (error). They are typically aligned with SLAs and expectations for how often upstream data should be updated.

Using loaded_at_field

For the customer_source table, we use a basic configuration: the table includes an updated_at timestamp column, which dbt uses directly to calculate the freshness. However, the timestamps in this demo are recorded in CET (Europe/Berlin), which means dbt converts them to UTC before evaluating freshness. This highlights a common real-world consideration: time zones must always be handled consistently in freshness checks.

Using loaded_at_query

For employee_source, we use a different approach. This table does not store a timestamp column. Instead, the load timestamps are stored in a metadata table. To handle this, we configure a loaded_at_query—a SQL query that retrieves the latest load time externally. This method is often used when:

  • Timestamps come from an ETL metadata or logging table
  • Data loads use high-watermark patterns
  • You want more control over how freshness timestamps are calculated

In the demo, the query simply selects the MAX(updated_at) value from the metadata table. While simple, it demonstrates how flexible dbt is when working with custom data loading patterns.

Using Optional Filters

dbt also supports an optional filter configuration, which lets you skip certain rows when evaluating freshness. For example, if a table contains soft-deleted records or historical rows that should not count toward freshness checks, you can filter them out. In our demo, the filter excludes rows where deleted = TRUE, ensuring only active records contribute to the freshness calculation.

This becomes particularly useful when old records appear fresher than the latest valid ones, which could skew your results or hide actual issues.

Running Freshness Checks in dbt Cloud

With the configuration in place, we run our first freshness check via the CLI:

dbt source freshness

Before running the check, we insert new rows into the source tables so that the latest data delay is around 20 minutes. Since this is below both the warning and error thresholds, the run reports everything as green.

This confirms that both types of configurations—loaded_at_field and loaded_at_query—are working as expected.

Triggering a Warning

Next, we enable the earlier-mentioned filter configuration on customer_source. After filtering out the deleted rows, the next valid record has a delay of about 50 minutes. When we run:

dbt source freshness -s source:dbt_talk_demo_sources.customer_source

dbt reports a warning state, because the threshold of 30 minutes is exceeded. This demonstrates how filtering can impact the evaluation in meaningful ways.

Triggering a Failure

To understand how a failed freshness check behaves, we insert data with delays exceeding the 60-minute error threshold. Running the same command again produces an error state. The dbt output also shows the exact SQL query it executed to determine freshness—useful when troubleshooting unexpected results.

Including Freshness in dbt Cloud Jobs

dbt Cloud provides two ways to incorporate freshness checks into scheduled jobs:

Option 1: “Run Source Freshness” Checkbox

With this option enabled, dbt automatically runs dbt source freshness as the first step of the job. However, failures do not stop the rest of the job from executing. This mode is ideal when you want visibility but don’t want freshness violations to block model builds.

Option 2: Adding Freshness as a Job Step

Alternatively, you can include freshness checks as an explicit job step. In this case, if freshness fails, subsequent steps are skipped and the job fails. This is the preferred option when:

  • Data reliability is critical
  • Your models depend on up-to-date sources
  • You want strong enforcement of data SLAs

The demo shows examples of both approaches, so you can choose which one best fits your project needs.

Monitoring Freshness in the dbt Cloud Catalog

dbt Cloud makes it easy to monitor freshness results long after the run completes. In the Catalog, you can drill down into each source and see the most recent freshness status, including warnings and errors. This gives data teams better visibility into upstream issues without needing to dive into logs.

For example, in our demo environment, the Catalog displays a warning icon for dbt_talk_demo_sources. Opening the source reveals the individual freshness statuses for each table. This is especially helpful in larger projects where tracking freshness manually would be impractical.

Key Takeaways

This demo highlights the full power and flexibility of dbt Source Freshness in real-world analytics environments. Here are the main lessons:

  • Freshness thresholds provide an essential guardrail for data reliability.
  • loaded_at_field is simple when the timestamp is in the table.
  • loaded_at_query enables more advanced scenarios using external metadata.
  • Filters help refine which rows count toward freshness.
  • dbt distinguishes between OK, warning, and error states in a clear, actionable way.
  • Freshness checks can run as part of your dbt Cloud jobs with configurable strictness levels.
  • The dbt Cloud Catalog provides ongoing visibility into the freshness of all sources.

By combining these tools, you can ensure your source data stays timely, trustworthy, and perfectly aligned with your organization’s SLAs. This ultimately improves downstream analytics quality, enhances user confidence, and reduces the risk of building insights on outdated data.

Watch the Video

Meet the Speaker

Dmytro Polishchuk profile picture

Dmytro Polishchuk
Senior BI Consultant

Dmytro Polishchuk has 7 years of experience in business intelligence and works as a Senior BI Consultant for Scalefree. Dmytro is a proven Data Vault 2.0 expert and has excellent knowledge of various (cloud) architectures, data modeling, and the implementation of automation frameworks. Dmytro excels in team integration and structured project work. Dmytro has a bachelor’s degree in Finance and Financial Management.

Leave a Reply

Close Menu