QuestDB for Timeseries Data: Storing Billions of records at Scale

When I started building trading systems, one question kept coming up: where do you store all those candles?

The Scale Problem

When you're prototyping a trading bot with a few thousand candles, everything works. Pandas loads it in memory, queries are instant, life is good.

Then you go to production. You're tracking 10,000 symbols at 1-minute resolution. That's over 500 million candles per year. Many databases struggle with this scale. Queries that should take milliseconds can stretch to 30 seconds. Backtests that should take minutes end up running overnight.

I wanted to avoid this from the start.

Why I Went With QuestDB

The benchmarks were compelling.

QuestDB runs filter queries on 1.2 billion rows in about 0.1 seconds. The same query on TimescaleDB takes around 4.4 seconds. ClickHouse comes in at about 1 second. That's a significant difference on identical hardware.

Ingestion rates tell a similar story—5-8x faster than both competitors.

For OHLCV data, it was an easy choice.

What Makes It Work

A few things matter specifically for financial time-series:

The SYMBOL type. Instead of storing "ADBE" as a string millions of times, it stores it once and references it by index. Simple idea, but not all databases handle this efficiently.

Automatic time partitioning. No manual configuration needed. It partitions data by day or month automatically. Query last week? It doesn't scan last year's data.

Column storage. Calculating moving averages? You only need the close column. QuestDB reads just that column. Row-based databases read everything.

Zero-GC in hot paths. The performance-critical code avoids garbage collection. This matters when you're ingesting thousands of candles per second from exchange websockets.

My Setup

The table I've been using:

CREATE TABLE ohlcv (
    symbol SYMBOL CAPACITY 10000 CACHE,
    timestamp TIMESTAMP,
    open DOUBLE,
    high DOUBLE,
    low DOUBLE,
    close DOUBLE,
    volume DOUBLE
) TIMESTAMP(timestamp) PARTITION BY DAY WAL;

SYMBOL handles the ticker efficiently, PARTITION BY DAY keeps things organized, WAL gives durability.

For ingestion from exchange websockets, the InfluxDB Line Protocol on port 9009 keeps things simple—just TCP, capable of handling millions of rows per second:

line = f"ohlcv,symbol={symbol} open={o},high={h},low={l},close={c},volume={v} {timestamp_ns}\n"
sock.sendall(line.encode())

For queries, the PostgreSQL wire protocol means any SQL client works—SQLAlchemy, psycopg2, whatever you prefer.

Queries I Run Daily

-- Date range for backtesting
SELECT * FROM ohlcv
WHERE symbol = 'AAPL'
AND timestamp BETWEEN '2024-01-01' AND '2024-06-01'
ORDER BY timestamp

-- Latest prices across all symbols
SELECT symbol, last(close), last(timestamp)
FROM ohlcv

-- Resample 1m candles to 1h
SELECT
    timestamp,
    first(open) AS open,
    max(high) AS high,
    min(low) AS low,
    last(close) AS close,
    sum(volume) AS volume
FROM ohlcv
WHERE symbol = 'ETHUSD'
SAMPLE BY 1h

The SAMPLE BY syntax is native to QuestDB. No window functions or complex GROUP BY needed—just specify the interval.

The Tradeoffs

It's not the right fit for everything.

Complex joins across multiple tables? Can be tricky. QuestDB is optimized for single-table time-series workloads.

ACID transactions across multiple operations? Not its strength. It's built for append-heavy workloads.

But for time-series storage? It handles it well.

Bottom Line

Three years of storing tick data and candles, running backtests, and powering dashboards. It's been reliable throughout. Backtests run quickly, and dashboards feel responsive.

If you're working with time-series data, QuestDB is worth considering.

Need help setting up OHLCV storage? Get in touch.