ClickHouse for Financial Data: Candles, Footprints, and the Timezone Trap

When you’re building a trading dashboard, you need to answer queries like:

“Give me all 1-minute candles for BTCUSDT between 9am and 5pm”
“Show me the buy/sell volume breakdown at each price level for the last hour”
“Enrich every candle with the open interest and funding rate at that time”

These are analytical queries — they aggregate millions of raw trades into summaries. ClickHouse is a database built specifically for this kind of work, and it handles these queries in milliseconds even on hundreds of millions of rows.

Let’s walk through exactly how to implement each of these.

First: Why ClickHouse Instead of Postgres or MySQL?#

Quick background for context. ClickHouse is a columnar database. Instead of storing a row (id, price, quantity, timestamp) all together, it stores all price values together, all quantity values together, etc.

When you run SELECT max(price) FROM trades, ClickHouse only reads the price column from disk — it skips every other column entirely. For analytics that touch 1-2 columns out of a table with 20 columns, this is dramatically faster than a row-based database.

The tradeoff: ClickHouse is not great for OLTP (single-row lookups, frequent small updates). For analytical pipelines ingesting millions of rows and running aggregations, it’s ideal.

The Table Setup: ReplacingMergeTree#

1
2
3
4
5
6
7
8
9
CREATE TABLE aggTrades (
    symbol        String,
    aggregate_id  Int64,
    price         Decimal(38, 18),
    quantity      Decimal(38, 18),
    timestamp     DateTime64(3, 'UTC'),
    is_buyer_maker UInt8
) ENGINE = ReplacingMergeTree()
ORDER BY (symbol, timestamp, aggregate_id);

ReplacingMergeTree is ClickHouse’s deduplication engine. The ORDER BY clause defines both how data is sorted on disk and what makes a row “unique” — if the same (symbol, timestamp, aggregate_id) is inserted twice (e.g., from both the live WebSocket and a recovery job), ClickHouse keeps only one copy.

Important: deduplication happens in the background during “merges”. Until then, duplicates may exist. Add FINAL to any query that needs correct results:

1
2
SELECT count() FROM aggTrades FINAL;
-- FINAL forces deduplication at query time

Computing OHLCV Candles#

OHLCV stands for Open, High, Low, Close, Volume — the standard representation of price data over a time period.

The key insight: candles are not stored, they’re computed. You store raw trades, and ClickHouse computes the candle aggregation on the fly. This means any candle interval (1 minute, 5 minutes, 1 hour) is available without pre-aggregation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
query := `
    SELECT
        -- Group trades into candles by rounding the timestamp down to the interval
        intDiv(toUnixTimestamp64Milli(timestamp) + 19800000, ?) * ? - 19800000 AS openTime,

        -- Open = price of the trade with the LOWEST aggregate_id in this candle
        argMin(price, aggregate_id) AS open,

        max(price)  AS high,
        min(price)  AS low,

        -- Close = price of the trade with the HIGHEST aggregate_id in this candle
        argMax(price, aggregate_id) AS close,

        -- Buy volume = sum of quantity where the buyer was NOT the maker
        sum(if(is_buyer_maker = 0, quantity, 0)) AS buyVol,
        sum(if(is_buyer_maker = 1, quantity, 0)) AS sellVol

    FROM v_aggTrades_main FINAL
    WHERE symbol = ? AND timestamp BETWEEN ? AND ?
    GROUP BY openTime
    ORDER BY openTime
`

A few things worth explaining:

argMin(price, aggregate_id) — “give me the price value where aggregate_id is at its minimum within this group.” That’s the first trade in the candle — the open price. argMax for the last trade — the close.

is_buyer_maker — in crypto, every trade has a “maker” (the order that was sitting on the order book) and a “taker” (the order that triggered the fill). When is_buyer_maker = 0, the buyer was the taker — meaning someone aggressively bought. When is_buyer_maker = 1, the buyer was the maker. Splitting volume this way tells you whether buyers or sellers were more aggressive.

The SLT Timezone Problem#

The +19800000 / -19800000 values in the candle query look magic. Here’s what they are:

Sri Lanka Standard Time (SLT) is UTC+5:30. In milliseconds: (5 hours × 3600 + 30 minutes × 60) × 1000 = 19,800,000 ms.

Why does this appear in a financial data query? Because candle boundaries are aligned to SLT midnight, not UTC midnight. A daily candle opens at 00:00 SLT = 18:30 UTC the previous day. Without the timezone correction, candle boundaries would be at wrong times.

The formula:

1
2
3
4
5
6
const SLT_OFFSET_MS = int64(19_800_000)

func SLTCandleOpenTime(unixMs, intervalMs int64) int64 {
    // Shift to SLT, floor to interval, shift back
    return (unixMs+SLT_OFFSET_MS)/intervalMs*intervalMs - SLT_OFFSET_MS
}

In plain English: shift the timestamp into SLT, round down to the nearest candle boundary, shift back to UTC. The result is the candle’s open time in UTC, but aligned to SLT boundaries.

Critical constraint: this formula must be identical on the server (Go + SQL) and the chart client (C++). Even a 1ms difference causes candles to render at the wrong position on the chart. This is a coordination invariant — changing the timezone offset requires simultaneously updating both sides.

Footprint Charts#

A footprint chart is a more detailed view of a candle. Instead of just OHLCV, it shows the buy/sell volume breakdown at each price level. This tells traders where volume clustered within a candle.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
    -- Round each trade's price to the nearest tick (minimum price increment)
    floor(price / ?) * ?  AS priceBucket,

    sum(if(is_buyer_maker = 0, quantity, 0)) AS buyVol,
    sum(if(is_buyer_maker = 1, quantity, 0)) AS sellVol

FROM v_aggTrades_main FINAL
WHERE symbol = ?
  AND timestamp BETWEEN ? AND ?  -- time range for one candle

GROUP BY priceBucket
ORDER BY priceBucket DESC

The ? parameter for price rounding is the tickSize — for BTCUSDT it’s 0.10 (prices move in $0.10 increments). floor(price / 0.10) * 0.10 snaps every price to the nearest $0.10 bucket.

For a busy 1-hour BTCUSDT candle, this might return 400 rows — one per $0.10 level where trades occurred. The chart renders this as a price ladder with buy/sell bars at each level.

LOCF Enrichment (Last Observation Carried Forward)#

“LOCF” is a data technique for filling in missing values. Context: market stats like open interest and funding rate arrive via REST API every 30–60 seconds. They don’t align to candle boundaries. When you want to enrich every 1-minute candle with the OI at that time, you need to find the nearest preceding OI reading.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// stats is a sorted slice of market stat snapshots
func findNearestStats(stats []MarketStats, candleOpenMs int64) *MarketStats {
    // Binary search: find first stat whose timestamp is AFTER the candle open
    idx := sort.Search(len(stats), func(i int) bool {
        return stats[i].T > candleOpenMs
    })
    // The one before that is the most recent stat BEFORE the candle opened
    if idx > 0 {
        return &stats[idx-1]
    }
    return nil
}

sort.Search does binary search — O(log n) — so this runs in microseconds even on thousands of candles. The pattern: “give me the most recent data point that was known at the time of this candle.” That’s LOCF — you carry the last observed value forward until a new one arrives.

Putting It Together: An Enriched Candle Response#

A single gRPC call to GetCandles produces candles that combine data from four sources:

Raw trades → OHLCV (ClickHouse query)
Market stats snapshots → OI, funding rate (Redis → LOCF enrichment)
Long/short ratio → sentiment indicator (REST poller)
Predicted liquidation levels → OI-derived model output (ClickHouse query)

All of this is assembled in Go and sent to the chart client in one response. The chart has everything it needs to draw a fully-featured candlestick chart with overlays — no additional requests.

This is the advantage of computing candles at query time rather than pre-aggregating: you can always re-enrich them with new data sources without changing the storage layer.