ClickHouse for Financial Data: Candles, Footprints, and the Timezone Trap
When you’re building a trading dashboard, you need to answer queries like:
- “Give me all 1-minute candles for BTCUSDT between 9am and 5pm”
- “Show me the buy/sell volume breakdown at each price level for the last hour”
- “Enrich every candle with the open interest and funding rate at that time”
These are analytical queries — they aggregate millions of raw trades into summaries. ClickHouse is a database built specifically for this kind of work, and it handles these queries in milliseconds even on hundreds of millions of rows.
Let’s walk through exactly how to implement each of these.
First: Why ClickHouse Instead of Postgres or MySQL?#
Quick background for context. ClickHouse is a columnar database. Instead of storing a row (id, price, quantity, timestamp) all together, it stores all price values together, all quantity values together, etc.
When you run SELECT max(price) FROM trades, ClickHouse only reads the price column from disk — it skips every other column entirely. For analytics that touch 1-2 columns out of a table with 20 columns, this is dramatically faster than a row-based database.
The tradeoff: ClickHouse is not great for OLTP (single-row lookups, frequent small updates). For analytical pipelines ingesting millions of rows and running aggregations, it’s ideal.
The Table Setup: ReplacingMergeTree#
| |
ReplacingMergeTree is ClickHouse’s deduplication engine. The ORDER BY clause defines both how data is sorted on disk and what makes a row “unique” — if the same (symbol, timestamp, aggregate_id) is inserted twice (e.g., from both the live WebSocket and a recovery job), ClickHouse keeps only one copy.
Important: deduplication happens in the background during “merges”. Until then, duplicates may exist. Add FINAL to any query that needs correct results:
| |
Computing OHLCV Candles#
OHLCV stands for Open, High, Low, Close, Volume — the standard representation of price data over a time period.
The key insight: candles are not stored, they’re computed. You store raw trades, and ClickHouse computes the candle aggregation on the fly. This means any candle interval (1 minute, 5 minutes, 1 hour) is available without pre-aggregation.
| |
A few things worth explaining:
argMin(price, aggregate_id) — “give me the price value where aggregate_id is at its minimum within this group.” That’s the first trade in the candle — the open price. argMax for the last trade — the close.
is_buyer_maker — in crypto, every trade has a “maker” (the order that was sitting on the order book) and a “taker” (the order that triggered the fill). When is_buyer_maker = 0, the buyer was the taker — meaning someone aggressively bought. When is_buyer_maker = 1, the buyer was the maker. Splitting volume this way tells you whether buyers or sellers were more aggressive.
The SLT Timezone Problem#
The +19800000 / -19800000 values in the candle query look magic. Here’s what they are:
Sri Lanka Standard Time (SLT) is UTC+5:30. In milliseconds: (5 hours × 3600 + 30 minutes × 60) × 1000 = 19,800,000 ms.
Why does this appear in a financial data query? Because candle boundaries are aligned to SLT midnight, not UTC midnight. A daily candle opens at 00:00 SLT = 18:30 UTC the previous day. Without the timezone correction, candle boundaries would be at wrong times.
The formula:
| |
In plain English: shift the timestamp into SLT, round down to the nearest candle boundary, shift back to UTC. The result is the candle’s open time in UTC, but aligned to SLT boundaries.
Critical constraint: this formula must be identical on the server (Go + SQL) and the chart client (C++). Even a 1ms difference causes candles to render at the wrong position on the chart. This is a coordination invariant — changing the timezone offset requires simultaneously updating both sides.
Footprint Charts#
A footprint chart is a more detailed view of a candle. Instead of just OHLCV, it shows the buy/sell volume breakdown at each price level. This tells traders where volume clustered within a candle.
| |
The ? parameter for price rounding is the tickSize — for BTCUSDT it’s 0.10 (prices move in $0.10 increments). floor(price / 0.10) * 0.10 snaps every price to the nearest $0.10 bucket.
For a busy 1-hour BTCUSDT candle, this might return 400 rows — one per $0.10 level where trades occurred. The chart renders this as a price ladder with buy/sell bars at each level.
LOCF Enrichment (Last Observation Carried Forward)#
“LOCF” is a data technique for filling in missing values. Context: market stats like open interest and funding rate arrive via REST API every 30–60 seconds. They don’t align to candle boundaries. When you want to enrich every 1-minute candle with the OI at that time, you need to find the nearest preceding OI reading.
| |
sort.Search does binary search — O(log n) — so this runs in microseconds even on thousands of candles. The pattern: “give me the most recent data point that was known at the time of this candle.” That’s LOCF — you carry the last observed value forward until a new one arrives.
Putting It Together: An Enriched Candle Response#
A single gRPC call to GetCandles produces candles that combine data from four sources:
- Raw trades → OHLCV (ClickHouse query)
- Market stats snapshots → OI, funding rate (Redis → LOCF enrichment)
- Long/short ratio → sentiment indicator (REST poller)
- Predicted liquidation levels → OI-derived model output (ClickHouse query)
All of this is assembled in Go and sent to the chart client in one response. The chart has everything it needs to draw a fully-featured candlestick chart with overlays — no additional requests.
This is the advantage of computing candles at query time rather than pre-aggregating: you can always re-enrich them with new data sources without changing the storage layer.