Monitoring data quality: from dbt tests to anomaly detection for analytics pipelines
An ELT pipeline running green every day doesn't mean the data is correct — structural tests only catch part of the problem. How to add freshness, volume, and anomaly-detection layers that catch data errors before someone notices a wrong number on a dashboard.
An ELT pipeline that runs green every morning — the DAG completes, no task fails — creates a false sense of safety. But "finished running" and "the data is correct" are two different things: a job can complete successfully while loading zero rows because a source API changed its response format, or it can load the expected row count with one column silently shifted because of an undocumented schema change. These failures are usually caught late — when someone asks why revenue dropped 90% overnight on a dashboard — by which point the bad data has already flowed into reports, and possibly into decisions.
This article describes the data-quality monitoring layer we add once a base ELT pipeline is already running reliably (see our article on building an ELT pipeline with dbt and Airflow) — not a single tool to buy, but a set of checks added incrementally, from cheap and simple to more sophisticated as the need arises.
Four common categories of data quality failure
- **Freshness** — data arrives later than expected. The job still runs on schedule, but the source stopped updating days ago, so the "latest" data in the warehouse is actually stale
- **Completeness** — missing records. An API returns paginated results but the job only pulls the first page, or a query filter accidentally drops a valid segment of records
- **Schema drift** — a source renames a column, changes a data type, or adds/removes a field without notice — most common when the source is a service owned by another team or a third party
- **Distribution anomalies** — the data is structurally valid and complete, but the values are abnormal relative to history: an order with a negative quantity, or a single day's revenue 50x the average due to a double-counting bug
The first three are caught by structural tests. The fourth — distribution anomalies — needs a different layer, because data that's structurally "valid" can still be wrong in business meaning.
dbt's built-in tests — necessary, not sufficient
dbt tests (not_null, unique, relationships, accepted_values, plus custom tests) are the cheapest layer and should be the first one added to every important staging and mart model. Running them in CI catches errors before merge; re-running them after every production pipeline run catches real data issues that surface later. dbt source freshness — checking a source's latest timestamp against an expected threshold — is the natural extension, catching the "job ran but the source went quiet" failure mode that ordinary structural tests miss.
The limitation: these tests only know about structure and declared constraints — not_null doesn't catch a column that always has a value, just the wrong one. A table can pass 100% of its dbt tests while revenue is double-counted because of a join that silently duplicated rows.
Statistical anomaly detection — the layer structural tests can't see
The next layer monitors business metrics over time, not table structure:
- **Volume checks** — compare today's row count against a rolling 7- or 14-day average, with tolerance bands by day of week (weekends naturally have fewer orders — that isn't an anomaly)
- **Z-score on key metrics** — revenue, order count, daily active users — flag values that deviate beyond N standard deviations from that metric's own historical distribution
- **Dynamic thresholds, not fixed ones** — a fixed threshold ("alert if orders < 100/day") quickly goes stale as the business grows or has seasonality; a rolling window adjusts itself to the recent trend
The practical payoff: an 80% drop in daily orders compared to the 7-day average gets flagged automatically within hours of the pipeline run — before someone on the business side notices it in a weekly report and asks the data team what happened.
Turning failures into actionable signals — alerting and a quality score
Detecting a failure has no value if nobody acts on it in time, but too many alerts without context lead to alert fatigue — and eventually to being ignored entirely. Some principles for routing alerts:
- Every alert should state: which table, which test/check failed, since when, and which mart/dashboard depends on that table — the recipient shouldn't have to trace lineage manually to gauge severity
- Tier severity by impact — a failure in a mart actively used by customer-facing dashboards should route differently (page immediately) than a failure in a rarely-touched internal intermediate table (log it, review daily)
- Roll up all test, freshness, and anomaly results into a single quality-score dashboard per table/source — letting the data team see quality trends over time, not just react to individual alerts
Design principle: catch failures as close to the source as possible
This extends the data contract concept from our ELT article: a data error caught at the staging layer — before it propagates into intermediate tables and marts — is far cheaper to fix than one caught at the dashboard, because at the staging layer no report has been built on the bad data yet. The closer monitoring sits to the source, the shorter the time between when an error occurs and when it's detected — and that gap matters more than the raw number of tests written.
This principle applies directly to systems that depend on clean data to make automated decisions, like the ingredient demand forecasting model we built for 240 F&B stores — an anomaly missed at the input layer turns into a wrong forecast, then into a wrong purchasing decision, well before anyone traces the root cause back to a single bad day of upstream data.
Conclusion
Data quality monitoring doesn't require a large tool to get started — dbt tests and source freshness checks already catch most common operational failures at close to zero cost. Statistical anomaly detection is the next step once the pipeline is stable and the risk shifts from "job failed" to "job ran, but the data is wrong in meaning." This is how we approach monitoring within the Data Analytics layer of every Pilot Build — not waiting for a wrong number to show up on a dashboard before knowing something is off.
Related articles
Building an ELT pipeline with dbt and Airflow: from zero to a production dashboard
A concrete roadmap from scattered data across multiple systems to a data warehouse you can trust — with dbt for transforms, Airflow for orchestration, and data contracts so teams don't break each other's dashboards.
Data digitization: from paper, Excel, and legacy systems to a single source of truth
Most data digitization projects fail not because OCR misreads characters, but because no one reconciles, normalizes, and merges the data afterward. A digitization pipeline architecture: OCR/ICR, validation, entity resolution/MDM, and an incremental migration strategy — why "one source of truth" matters more than "99% character accuracy".
AI + IoT in Smart Clean Vegetable Farms — sensors, models, and the automation loop
A VietGAP-certified vegetable farm no longer means the farmer stays up monitoring soil moisture or irrigating by feel. We break down the architecture combining multi-sensor IoT, an AI vision model for early pest and disease detection, and a closed-loop control system running at the edge — plus how VietGAP traceability becomes a natural by-product of comprehensive logging.