What is orderbook depth data and how is it different from OHLCV?

OHLCV (Open-High-Low-Close-Volume) candles only show price and volume at a surface level. Orderbook depth data reveals the actual bid and ask liquidity sitting at multiple price levels — showing you how much buying and selling pressure exists and how far from the market price. Our dataset provides 10 levels of depth on both sides, measured in cumulative volume and basis-point distance from mid-price. This is the raw material that institutional traders and quantitative researchers use to detect order flow, predict short-term price movements, and model realistic execution costs.

Which instruments and timeframe does the dataset cover?

The current release covers 24 major crypto perpetual futures sourced from the leading L1 Perpetual Decentralized Exchange: BTC, ETH, SOL, BNB, XRP, DOGE, ADA, AVAX, LINK, DOT, NEAR, SUI, OP, ARB, SEI, TIA, INJ, APT, FIL, LTC, ETC, WIF, XLM, and ATOM. All data is aggregated into 5-minute bars spanning from March 2025 to February 2026 — approximately 12 months of continuous coverage. Each instrument has ~96,000 bars with 47 derived columns per row, including OHLCV, bid/ask volumes, and bid/ask distances at 10 depth levels.

Can I use this data for machine learning and deep learning?

Absolutely. The dataset is specifically designed for ML workflows. The 47-column schema feeds directly into LSTM networks, Transformer architectures, reinforcement learning environments (like Gymnasium/Stable-Baselines3), and gradient-boosted models (XGBoost, LightGBM). Common derived features include bid-ask imbalance ratios, depth-weighted pressure scores, liquidity concentration metrics, and spread dynamics — all computable in a few lines of Python from our raw columns.

Can you create custom datasets with different instruments, timeframes, or depth levels?

Yes. We can generate orderbook depth data for any cryptocurrency available on major derivatives DEXs and CEXs, at any candle interval (1-minute, 5-minute, 15-minute, 1-hour, etc.), covering any historical time period. We also support extended depth up to 30 levels. Contact us at imbalancelabs@gmail.com with your requirements for a custom quote.

Is this data legally safe to use?

Yes. Our datasets are classified as Derived Data — an Aggregated Liquidity and Orderbook Depth Index. All raw order book snapshots have been aggregated across time intervals, normalized, and transformed through statistical computations. The original tick-level data is not recoverable from our product. This classification places it outside the scope of most exchange data redistribution restrictions.

What format is the data delivered in?

All datasets are delivered as compressed CSV files (.csv.gz), which can be loaded directly by Python pandas, R, DuckDB, Apache Spark, and most data analysis tools. Each file is named by instrument (e.g., BTC_5m_depth10_derived.csv.gz). The full dataset ZIP containing all 24 instruments is approximately 300 MB.

Market Microstructure·8 min read·Dec 15, 2025

Why OHLCV Models Fail: Estimating Slippage with DEX L2 Data

By Imbalance Labs Research

The Illusion of Liquidity

Every quantitative researcher starts with the same dataset: OHLCV candles. Open, High, Low, Close, Volume — the five pillars of traditional market analysis. These features are ubiquitous, free, and easy to source. They are also dangerously misleading.

The fundamental problem with OHLCV data is that it compresses an entire trading period into a single price range. A 5-minute BTC candle might show a high of $67,500 and a low of $67,200, suggesting a $300 spread. But what it doesn't tell you is whether there was enough liquidity at $67,500 to actually fill a meaningful order. In most cases, there wasn't.

This is the illusion of liquidity. A model trained on OHLCV data learns that prices “were” at certain levels. But it has no concept of how much capital was available at those levels. The result? Backtests that show beautiful equity curves, but fail catastrophically in live trading due to slippage, partial fills, and market impact.

What Level 2 Data Reveals

Level 2 orderbook data solves this problem by revealing the actual depth of the market. Instead of a single price point, you see the cumulative volume sitting at multiple price levels on both the bid and ask sides.

Our BTC historical orderbook dataset captures 10 levels of depth with precise metrics: bid_volume_level_1 through bid_volume_level_10, along with the basis-point distance from mid-price (bid_distance_level_1).

Critically, our data comes from Hyperliquid — a fully on-chain decentralized exchange. Unlike centralized exchanges (CEX) where market makers routinely engage in spoofing (placing and canceling large orders to manipulate price), DEX orderbooks represent genuine committed liquidity. Every order you see in our dataset was a real, on-chain commitment of capital.

Estimating Real Slippage

With L2 depth data, you can estimate real execution costs before placing a trade. The basic formula for estimated slippage on a buy order of size Q is:

Slippage(Q) = Σ min(Q_remaining, vol_at_level_i) × distance_i
              ─────────────────────────────────────────────────
                                    Q

Here's how to compute this with our dataset:

import pandas as pd

# Load BTC Level 2 depth data
df = pd.read_parquet('btc_l2_depth_5m.parquet')

def estimate_slippage(row, order_size_usd=50_000):
    """Estimate slippage for a market buy of given size."""
    remaining = order_size_usd
    weighted_distance = 0.0

    for level in range(1, 11):
        vol = row[f'ask_volume_level_{level}']
        dist = row[f'ask_distance_level_{level}']  # in basis points
        filled = min(remaining, vol)
        weighted_distance += filled * dist
        remaining -= filled
        if remaining <= 0:
            break

    return weighted_distance / order_size_usd  # avg slippage in bps

df['slippage_50k'] = df.apply(
    lambda r: estimate_slippage(r, 50_000), axis=1
)

print(f"Median slippage for $50K order: {df['slippage_50k'].median():.2f} bps")
print(f"95th percentile: {df['slippage_50k'].quantile(0.95):.2f} bps")

Building Robust Backtests

Armed with slippage estimates, you can build backtesting engines that model realistic execution. Instead of assuming fills at the close price (as most OHLCV backtests do), you can:

Simulate order fills level-by-level through the orderbook
Apply dynamic transaction costs based on actual depth
Detect low-liquidity regimes where your strategy should reduce position size
Model market impact for larger orders using depth decay functions

# Realistic fill price using top-5 levels of depth
def realistic_fill_price(row, side='buy', order_usd=10_000):
    price = row['close_price']
    remaining = order_usd
    total_cost = 0.0
    levels = 'ask' if side == 'buy' else 'bid'

    for i in range(1, 6):
        vol = row[f'{levels}_volume_level_{i}']
        dist_bps = row[f'{levels}_distance_level_{i}']
        level_price = price * (1 + dist_bps / 10_000)
        filled = min(remaining, vol)
        total_cost += filled * level_price
        remaining -= filled
        if remaining <= 0:
            break

    return total_cost / (order_usd - remaining) if remaining < order_usd else price

Conclusion

OHLCV data is a useful starting point, but it is fundamentally insufficient for serious quantitative research. The gap between backtested performance and live performance — the so-called “backtest-to-live decay” — is largely driven by unrealistic assumptions about liquidity and execution costs. Level 2 orderbook depth data, especially from transparent on-chain venues like Hyperliquid, provides the foundation for models that actually work in production.

Get the Data

Start building slippage-aware models with institutional-grade orderbook depth data across 24 crypto instruments.

Browse Datasets →View BTC Dataset

Full 47-column schema documentation available.