§ blog · Server & Database06/08/2026

Database architecture for IoT systems: when to use TimescaleDB, when to use ClickHouse

The two most popular time-series engines for sensor data — selection criteria, schema design, and a data lifecycle strategy so costs don't explode with device count.

Server & DatabaseTimescaleDBClickHouseIoT7 min read

By KonexForge Engineering Team

One of the first questions when designing an IoT system is: where will sensor data — millions of data points per day, from hundreds or thousands of devices — be stored? A wrong answer at this stage usually doesn't cause an immediate failure, but quietly accumulates into infrastructure costs and query latency after 6-12 months of operation. This article describes how we approach this problem when designing a Pilot Build for an industrial sensor monitoring system — not abstract theory, but a concrete architecture you can apply right away.

Two leading candidates: TimescaleDB and ClickHouse

TimescaleDB is a PostgreSQL extension that turns a regular table into a "hypertable" — automatically partitioned by time. Its biggest advantage is that you still get the full PostgreSQL toolkit: foreign key constraints, transactions, JOINs with metadata tables (device lists, installation locations, alert thresholds), and an ORM/driver ecosystem familiar to every backend engineer.

ClickHouse is a columnar database purpose-built for analytics — fast reads across billions of rows, very good compression (typically 5-10× compared to row-based storage) — but with trade-offs: no real transactions, slow and complex UPDATE/DELETE, and JOINs between large tables aren't its strength.

Selection criteria — not "which is better" but "which fits the problem"

Device count under 5,000 and write frequency under 1 point/second/device → TimescaleDB is usually enough, and simplifies operations since you only need one Postgres instance
Large device counts (tens of thousands+) or long-term storage of high-frequency raw data (vibration, audio, tens of Hz) → ClickHouse delivers noticeably better read performance and compression ratios
Need a low-latency, real-time operations dashboard (operators viewing live metrics) → TimescaleDB with continuous aggregates handles this well without adding another system
Need long-term historical analysis, BI, or ML model training on raw data → ClickHouse is the natural choice for the data warehouse layer

In many cases, the real answer is both — but playing two different roles within the same architecture, not replacing each other.

A reference architecture: hot path + cold path

The approach we typically propose for an industrial sensor monitoring system (e.g., vibration, temperature, pressure from a few thousand sensors) is to split storage into two layers with clearly defined roles:

Hot path — TimescaleDB

Data from the last 7-30 days, downsampled to a resolution suitable for operations dashboards (e.g., 1 point/minute instead of 1 point/second). Timescale's continuous aggregates pre-compute average/min/max values per time window, so the dashboard loads instantly without scanning raw data. This is the layer that operators, the alert engine, and the application's API interact with directly.

Cold path — ClickHouse or object storage (Parquet on S3/R2)

Full raw data, retained long-term for analysis, audit, and as training data for forecasting/anomaly-detection models. A batch job (running daily or hourly) pushes "cooled" data from Timescale to ClickHouse or down to Parquet, then deletes it from the hot path to keep the hot table small and queries fast.

Design principle: whatever layer serves real-time operations must be small and fast; whatever layer serves analysis/training can be large and cheap — but doesn't need to be instantly fast.

Schema design — a few common pitfalls

Always keep a separate devices table (metadata: location, model, alert thresholds, firmware version) and a time-series table containing only device_id, timestamp, metric, value. Avoid stuffing metadata into every measurement row — it wastes storage and makes updates painful when a device's configuration changes
Use the narrowest data type possible for value — float4 instead of float8/numeric if precision allows, because the difference multiplied across billions of rows is significant
Set a retention policy from day one, not when the disk fills up. TimescaleDB has add_retention_policy() to automatically drop old chunks; ClickHouse has TTL at the table/partition level
Indexing on (device_id, timestamp) is nearly mandatory — this is the most common query pattern ("get data for device X over time range Y")

Cost — a factor often overlooked until it's too late

A common mistake is designing a system that performs well at pilot scale (a few dozen devices) but whose storage costs grow linearly — or worse, super-linearly as indexes balloon — once scaled to thousands of devices. Timescale's compression (column-oriented storage for older chunks) can cut storage by 90%+ compared to a regular heap, but it must be explicitly configured (compress_chunk policy) — it isn't enabled by default in every version.

Conclusion

There's no single correct answer for every IoT system — but there is a right question to answer first: who will read this data, at what latency, and for how long? Answering that before choosing a database helps avoid having to migrate the entire storage system midstream — one of the most expensive and risky tasks in operating a data system. This is also part of the Discovery Sprint we run before every Pilot Build: mapping out data flows and expected volumes before writing a single line of infrastructure-as-code.

If your team is designing — or struggling with cost/latency issues on — an existing time-series system, this is exactly the kind of problem that falls within the Server & Database layer of the closed loop KonexForge builds — similar to how ClickHouse serves as the cold path in the water-quality monitoring system for 12 districts we deployed.

Server & Database

Zero-downtime database migration: dual-write, backfill, and a safe cutover for a live system

Swapping a database engine, splitting a schema, or moving cloud providers for a system already serving real customers can no longer be solved with a midnight maintenance window. The four stages of a safe migration — dual-write, backfill, verify, cutover — and the most common trap at each step.

Server & Database

Kubernetes for ML and data workloads: designing a cluster from node pools to GPU scheduling

Kubernetes is the standard for container orchestration — but default configurations don't hold up under ML workloads. GPU nodes need dedicated taints and tolerations; training jobs can starve the entire cluster without resource quotas; model serving needs PodDisruptionBudgets to avoid mid-inference interruptions. Four common problems when running ML on Kubernetes and how to design the cluster correctly from the start.

Optimization Loop

The technology behind every KonexForge Pilot Build: one 6-layer system, not 6 separate products

Most services pages list technology like a spec sheet — MQTT, Kubernetes, dbt, PyTorch, React, Grafana. But technology only creates value when it's assembled correctly into a closed-loop system. An overview of the 6 layers KonexForge builds with for clients, backed by real evidence from delivered projects.

Have a similar problem to solve?

Contact the team