§ blog · Data Analytics06/12/2026

Building an ELT pipeline with dbt and Airflow: from zero to a production dashboard

A concrete roadmap from scattered data across multiple systems to a data warehouse you can trust — with dbt for transforms, Airflow for orchestration, and data contracts so teams don't break each other's dashboards.

Data AnalyticsdbtAirflowELT8 min read

By KonexForge Engineering Team

Most businesses don't lack data — they lack a reliable path from raw data to a number that decision-makers can look at and trust. This article describes the roadmap we recommend when building the Data Analytics layer in a Pilot Build: from data scattered across CRM, sales systems, and Excel files, to a structured warehouse with tests and self-updating dashboards.

Why ELT, not ETL

Traditional ETL (Extract-Transform-Load) transforms data before loading it into the warehouse — meaning the transform logic lives in an intermediate system that's often hard to test and hard to change. ELT (Extract-Load-Transform) loads raw data into the warehouse first, then transforms it with SQL directly inside the warehouse. With modern warehouses (BigQuery, Snowflake, ClickHouse, or Postgres for small-to-medium scale), compute is cheap enough to make this practical — and the benefit is that all transform logic becomes code that can be versioned, tested, and reviewed like regular software.

Three layers in the warehouse — staging, intermediate, marts

dbt (data build tool) organizes transforms into SQL models, typically split into three layers:

Staging: 1-1 with the raw source — only renames columns, casts data types, and filters out junk records. No business logic here
Intermediate: where tables from multiple sources are joined, deduplicated, or partially aggregated — for example, merging orders from a POS system and an online system into a single unified orders table
Marts: final tables, designed specifically for a particular group of business questions (revenue by store, monthly churn...) — these are the tables that BI tools connect to

Practical rule: if a number appears on 2 different dashboards, it must be computed in exactly one place — in a mart model — not recomputed with a separate formula in each dashboard.

Airflow's role — not to run SQL, but to orchestrate

A common misconception is that Airflow "runs the transforms". In reality, Airflow only orchestrates: it calls dbt run commands in the right order, at the right time, after source data is ready — and handles what happens when a step fails (retry, alert, backfill). The transform logic itself still lives in dbt. Keeping these two roles clearly separated helps you:

Swap orchestrators (Airflow → Dagster, or vice versa) without rewriting transforms
Run and test transforms locally without Airflow running
Debug more easily — "the transform is wrong" and "the job didn't run" are two different kinds of failures, at two different layers

Data contracts — agreements between teams

When multiple teams contribute to one warehouse, the easiest thing that can go wrong is one team changing a source schema (renaming a column, changing a data type) and breaking another team's dashboard — usually discovered days later, when someone asks why the numbers look wrong. A data contract is a simple convention: every important source table has an explicitly declared schema (dbt sources.yml), along with automated tests (not_null, unique, relationships) that run every time the pipeline executes. If a source changes and breaks the contract, the pipeline fails right at the staging step — before bad data reaches any dashboard.

A practical rollout roadmap — 4 phases

Phase 1 — Audit and data contracts (1-2 weeks)

List out all existing data sources, assess their reliability, and define the expected schema for each one. This is the cheapest phase but often produces the biggest findings — many "facts" about the data that the operations team believes turn out to be wrong once checked directly.

Phase 2 — Staging layer + minimal pipeline (2-3 weeks)

Bring the 2-3 most important sources into the warehouse, build staging models with basic tests, and a simple Airflow DAG that runs daily. The goal is to have one working end-to-end loop — even if it only serves a single business question — before expanding further.

Phase 3 — Marts + first dashboard (2-3 weeks)

Build 1-2 mart models that answer specific business questions (e.g., revenue and cancellation rate by day/region), connected to a dashboard (Superset, Metabase, or Looker Studio). This is the point where the value becomes tangible to end users — and usually where the next expansion request comes from.

Phase 4 — Expand + self-service (ongoing)

Add new data sources as needed, train the business team to build their own dashboards on existing marts (no complex SQL required), and set up alerting when dbt tests fail — this is the Optimization Loop applied to data infrastructure.

A note on tooling

dbt and Airflow are popular choices with a solid ecosystem, but they're not the only option — at smaller scale, a simple cron job calling a Python script + dbt core, or Dagster instead of Airflow, may be a more sensible choice with less operational overhead. Tooling should be chosen once data volume and the number of pipelines to manage are clear — not before.

Conclusion

A good ELT pipeline isn't judged by which tools it uses, but by whether: when data is wrong, the pipeline flags it right at the source, and when a new business question comes up, you don't need to rebuild from scratch. This is the standard we use when designing the Data Analytics layer for every Pilot Build — starting small, but with a structure solid enough to scale without a rebuild, as in the ETL pipeline merging telematics data for 600 trucks that we built.

Data Analytics

Monitoring data quality: from dbt tests to anomaly detection for analytics pipelines

An ELT pipeline running green every day doesn't mean the data is correct — structural tests only catch part of the problem. How to add freshness, volume, and anomaly-detection layers that catch data errors before someone notices a wrong number on a dashboard.

Data Analytics

China's GIS technology leads Asia: the concrete evidence, and why 'surpassing Japan' needs a closer look

China now holds 34.2% of the Asia-Pacific GIS market — the largest share in the region, ahead of Japan — with SuperMap leading software and Jilin-1/Gaofen leading commercial remote-sensing satellites. But "surpassing Japan" needs qualification: Japan still leads in micro-SAR satellites and HD autonomous-driving maps, and some direct comparisons simply don't exist in real sources.

Data Analytics

Data digitization: from paper, Excel, and legacy systems to a single source of truth

Most data digitization projects fail not because OCR misreads characters, but because no one reconciles, normalizes, and merges the data afterward. A digitization pipeline architecture: OCR/ICR, validation, entity resolution/MDM, and an incremental migration strategy — why "one source of truth" matters more than "99% character accuracy".

Have a similar problem to solve?

Contact the team

Building an ELT pipeline with dbt and Airflow: from zero to a production dashboard

Why ELT, not ETL

Three layers in the warehouse — staging, intermediate, marts

Airflow's role — not to run SQL, but to orchestrate

Data contracts — agreements between teams

A practical rollout roadmap — 4 phases

Phase 1 — Audit and data contracts (1-2 weeks)

Phase 2 — Staging layer + minimal pipeline (2-3 weeks)

Phase 3 — Marts + first dashboard (2-3 weeks)

Phase 4 — Expand + self-service (ongoing)

A note on tooling

Conclusion

Related articles

Monitoring data quality: from dbt tests to anomaly detection for analytics pipelines

China's GIS technology leads Asia: the concrete evidence, and why 'surpassing Japan' needs a closer look

Data digitization: from paper, Excel, and legacy systems to a single source of truth

Have a similar problem to solve?