What is orderbook depth data and how is it different from OHLCV?

OHLCV (Open-High-Low-Close-Volume) candles only show price and volume at a surface level. Orderbook depth data reveals the actual bid and ask liquidity sitting at multiple price levels — showing you how much buying and selling pressure exists and how far from the market price. Our dataset provides 10 levels of depth on both sides, measured in cumulative volume and basis-point distance from mid-price. This is the raw material that institutional traders and quantitative researchers use to detect order flow, predict short-term price movements, and model realistic execution costs.

Which instruments and timeframe does the dataset cover?

The current release covers 24 major crypto perpetual futures sourced from the leading L1 Perpetual Decentralized Exchange: BTC, ETH, SOL, BNB, XRP, DOGE, ADA, AVAX, LINK, DOT, NEAR, SUI, OP, ARB, SEI, TIA, INJ, APT, FIL, LTC, ETC, WIF, XLM, and ATOM. All data is aggregated into 5-minute bars spanning from March 2025 to February 2026 — approximately 12 months of continuous coverage. Each instrument has ~96,000 bars with 47 derived columns per row, including OHLCV, bid/ask volumes, and bid/ask distances at 10 depth levels.

Can I use this data for machine learning and deep learning?

Absolutely. The dataset is specifically designed for ML workflows. The 47-column schema feeds directly into LSTM networks, Transformer architectures, reinforcement learning environments (like Gymnasium/Stable-Baselines3), and gradient-boosted models (XGBoost, LightGBM). Common derived features include bid-ask imbalance ratios, depth-weighted pressure scores, liquidity concentration metrics, and spread dynamics — all computable in a few lines of Python from our raw columns.

Can you create custom datasets with different instruments, timeframes, or depth levels?

Yes. We can generate orderbook depth data for any cryptocurrency available on major derivatives DEXs and CEXs, at any candle interval (1-minute, 5-minute, 15-minute, 1-hour, etc.), covering any historical time period. We also support extended depth up to 30 levels. Contact us at imbalancelabs@gmail.com with your requirements for a custom quote.

Is this data legally safe to use?

Yes. Our datasets are classified as Derived Data — an Aggregated Liquidity and Orderbook Depth Index. All raw order book snapshots have been aggregated across time intervals, normalized, and transformed through statistical computations. The original tick-level data is not recoverable from our product. This classification places it outside the scope of most exchange data redistribution restrictions.

What format is the data delivered in?

All datasets are delivered as compressed CSV files (.csv.gz), which can be loaded directly by Python pandas, R, DuckDB, Apache Spark, and most data analysis tools. Each file is named by instrument (e.g., BTC_5m_depth10_derived.csv.gz). The full dataset ZIP containing all 24 instruments is approximately 300 MB.

Backtesting·9 min read·Jun 21, 2026

Crypto Backtesting Explained: What a Backtest Reveals (and the Data It Needs)

By Imbalance Labs Research

TL;DR

A backtest is a controlled experiment on historical data. It reveals an equity curve, drawdown, Sharpe, win rate and regime sensitivity — but only for the reality your data encodes. Backtest on price-only candles and you measure a fantasy with no slippage or liquidity. Backtest on Level 2 orderbook depth and you measure something close to what live trading will actually do. If you're shopping for a backtester, the harder problem is usually the data feeding it.

What a Backtest Actually Is

A backtest replays a trading rule over historical market data and records what would have happened. You define a signal (“go long when this condition is true”), an execution model (how orders fill), and a cost model (fees and slippage). The backtester walks the data bar by bar, simulates the trades, and produces a performance record. Done honestly, it's the cheapest way to reject a bad idea before it costs real money.

Done carelessly, it's the most expensive form of self-deception in quant finance. The difference is rarely the backtesting engine — it's the assumptions and the data.

What a Backtest Reveals

A good backtest surfaces far more than a single return number. The metrics that actually matter:

Equity curve — the shape of growth over time, and whether it came from a few lucky trades or a consistent edge.
Maximum drawdown — the worst peak-to-trough loss. It decides whether you could actually hold the strategy through the pain.
Sharpe / risk-adjusted return — return per unit of volatility. A 40% return with 80% drawdown is worse than 15% with 8%.
Win rate & turnover — how often you're right, and how often you trade. High turnover makes the strategy exquisitely sensitive to costs.
Regime sensitivity — does the edge survive bull, bear, and chop, or only one of them?

Notice that two of these — turnover sensitivity and the realism of the equity curve itself — depend entirely on how accurately you model execution. Which brings us to the catch.

The Catch: A Backtest Only Reveals What's in Your Data

Here is the uncomfortable truth most backtesting tutorials skip: a backtest is a function of its inputs. If your data doesn't contain liquidity information, your backtest cannot reveal liquidity risk — it will silently assume infinite liquidity at the close price.

This is the core failure of OHLCV-only backtests. A 5-minute candle tells you the high was $67,500, but not whether there was enough resting size at $67,500 to fill your order. Assume there was, and your backtest prints a beautiful equity curve that quietly evaporates in production. We unpack this in depth in Why OHLCV Models Fail.

Level 2 orderbook data closes the gap. With the resting bid/ask volume at multiple price levels, a backtest can sweep the book to compute the price you'd actually fill at — turning “assumed” execution into modelled execution. The full recipe is in Calculating Realistic Slippage with L2 Data.

A Depth-Aware Backtest Loop

The skeleton of a realistic backtest isn't complicated — the realism comes from the data it reads. Here a position is decided from information available before each bar (no look-ahead), and every position change pays a cost:

import pandas as pd

df = pd.read_parquet("btc_l2_depth_5m.parquet")

# Signal you only get from L2 depth: top-of-book imbalance
df["obi"] = df["bid_volume_level_1"] / (
    df["bid_volume_level_1"] + df["ask_volume_level_1"]
)

fee = 5 / 10_000            # 5 bps round-trip cost (fees + slippage)
equity, pos = 1.0, 0
curve = []

for i in range(1, len(df)):
    # Decide today's position from YESTERDAY's signal (causal)
    new_pos = 1 if df["obi"].iloc[i - 1] > 0.55 else 0
    ret = df["close_price"].iloc[i] / df["close_price"].iloc[i - 1] - 1
    cost = fee if new_pos != pos else 0.0
    equity *= 1 + new_pos * ret - cost
    pos = new_pos
    curve.append(equity)

print(f"Final equity: {equity:.3f}  ({(equity - 1) * 100:+.1f}%)")

Swap the obi signal for an RSI or MACD rule and you have a fair comparison: does the orderbook signal — which needs depth data — beat a price-only indicator after costs? That question is the whole game, and it's exactly what the demo below lets you explore. (For the imbalance signal itself, see Orderbook Imbalance Signals.)

Try a live backtest — no setup

Every dataset page has an interactive backtester running on a real 7-day orderbook sample. Move the sliders — OBI threshold, RSI settings, cost per trade — and watch the equity curve and drawdown react against buy & hold, in your browser, on real data.

Open the BTC backtester demo →

The Data Backtesters Actually Need

If you're evaluating backtesters, you'll quickly find the engine is the easy part — Backtrader, vectorbt, QuantConnect, or a 50-line loop like the one above all work. The bottleneck is feeding them clean, realistic data. For crypto specifically, that means:

Continuous history across a full market cycle — bull, bear, and chop — so regime sensitivity is testable.
Level 2 depth, not just OHLCV, so execution cost is modelled rather than assumed.
Clean, time-aligned bars — no gaps, no clock drift, no async WebSocket artifacts.
A signal-rich venue. We source from Hyperliquid's on-chain orderbook, where resting orders are real commitments rather than the spoofing that pollutes many CEX feeds — see how to get Hyperliquid historical data.

Our historical orderbook datasets are built for exactly this: 24 instruments, 12+ months, 10-level depth, normalized into an analysis-ready 47-column schema you can drop straight into your backtester.

Frequently Asked Questions

What does a backtest actually reveal?

A backtest reveals how a trading rule would have behaved on historical data: its equity curve, total and risk-adjusted return (Sharpe), maximum drawdown, win rate, turnover, and how sensitive it is to different market regimes. Critically, it only reveals what your data contains — a backtest run on price-only OHLCV candles cannot reveal slippage, partial fills, or liquidity gaps, because that information simply isn't in the data.

What data do I need to backtest a crypto strategy properly?

At minimum you need clean, gap-free historical price and volume. To backtest realistically — modelling execution cost and market impact — you also need Level 2 orderbook depth: the resting bid/ask volume at multiple price levels. Depth data is what lets a backtest estimate the price you'd actually fill at, rather than assuming you trade unlimited size at the close.

Can I backtest with free OHLCV data?

You can, and it's a fine starting point for signal research. But OHLCV-only backtests systematically overstate performance because they ignore execution reality — every fill is assumed to happen at the candle price with infinite liquidity. The gap between an OHLCV backtest and live results (the 'backtest-to-live decay') is largely driven by this missing depth information.

Why do backtested strategies fail in live trading?

The three usual culprits are overfitting (the rule was tuned to historical noise), regime change (the future doesn't resemble the test window), and unrealistic execution assumptions (ignoring slippage, fees, and liquidity). The first two are about method; the third is about data — and it's the one orderbook depth data directly fixes.

Backtest on data that tells the truth

Stop measuring a fantasy. Feed your backtester real Hyperliquid Level 2 depth across 24 instruments and 12+ months — and start with a free 7-day sample.

Browse Datasets →Download Free Sample