What is orderbook depth data and how is it different from OHLCV?

OHLCV (Open-High-Low-Close-Volume) candles only show price and volume at a surface level. Orderbook depth data reveals the actual bid and ask liquidity sitting at multiple price levels — showing you how much buying and selling pressure exists and how far from the market price. Our dataset provides 10 levels of depth on both sides, measured in cumulative volume and basis-point distance from mid-price. This is the raw material that institutional traders and quantitative researchers use to detect order flow, predict short-term price movements, and model realistic execution costs.

Which instruments and timeframe does the dataset cover?

The current release covers 24 major crypto perpetual futures sourced from the leading L1 Perpetual Decentralized Exchange: BTC, ETH, SOL, BNB, XRP, DOGE, ADA, AVAX, LINK, DOT, NEAR, SUI, OP, ARB, SEI, TIA, INJ, APT, FIL, LTC, ETC, WIF, XLM, and ATOM. All data is aggregated into 5-minute bars spanning from March 2025 to February 2026 — approximately 12 months of continuous coverage. Each instrument has ~96,000 bars with 47 derived columns per row, including OHLCV, bid/ask volumes, and bid/ask distances at 10 depth levels.

Can I use this data for machine learning and deep learning?

Absolutely. The dataset is specifically designed for ML workflows. The 47-column schema feeds directly into LSTM networks, Transformer architectures, reinforcement learning environments (like Gymnasium/Stable-Baselines3), and gradient-boosted models (XGBoost, LightGBM). Common derived features include bid-ask imbalance ratios, depth-weighted pressure scores, liquidity concentration metrics, and spread dynamics — all computable in a few lines of Python from our raw columns.

Can you create custom datasets with different instruments, timeframes, or depth levels?

Yes. We can generate orderbook depth data for any cryptocurrency available on major derivatives DEXs and CEXs, at any candle interval (1-minute, 5-minute, 15-minute, 1-hour, etc.), covering any historical time period. We also support extended depth up to 30 levels. Contact us at imbalancelabs@gmail.com with your requirements for a custom quote.

Is this data legally safe to use?

Yes. Our datasets are classified as Derived Data — an Aggregated Liquidity and Orderbook Depth Index. All raw order book snapshots have been aggregated across time intervals, normalized, and transformed through statistical computations. The original tick-level data is not recoverable from our product. This classification places it outside the scope of most exchange data redistribution restrictions.

What format is the data delivered in?

All datasets are delivered as compressed CSV files (.csv.gz), which can be loaded directly by Python pandas, R, DuckDB, Apache Spark, and most data analysis tools. Each file is named by instrument (e.g., BTC_5m_depth10_derived.csv.gz). The full dataset ZIP containing all 24 instruments is approximately 300 MB.

Data Sourcing·9 min read·Jun 21, 2026

Hyperliquid Historical Orderbook Data: How to Get L2 Depth

By Imbalance Labs Research

TL;DR

Hyperliquid is the most active on-chain perpetuals exchange, but it does not expose a ready-made historical Level 2 orderbook API. You have three options: (1) record the live WebSocket feed yourself, (2) reconstruct the book from raw node archives, or (3) buy a cleaned, time-aligned dataset. This guide explains the trade-offs and shows the Python to load L2 depth once you have it.

Why Hyperliquid Orderbook Data Is Worth the Trouble

Hyperliquid runs a fully on-chain central limit order book (CLOB) for perpetual futures. Unlike most centralized exchanges, where market makers post and pull zero-fee quotes thousands of times per second, every resting order on Hyperliquid is a genuine on-chain commitment. That property makes its Level 2 depth one of the cleanest microstructure signals available in crypto — far less polluted by spoofing and phantom liquidity. For anyone modeling slippage, order-flow imbalance, or statistical arbitrage, this is exactly the kind of orderbook data you want.

The catch: getting historical depth — not just a live feed — is where most researchers hit a wall. Below are the three realistic paths, from hardest to easiest.

Option 1 — The Native Hyperliquid API (and its limits)

Hyperliquid exposes a public REST endpoint at POST /info. The relevant request types for orderbook work are:

l2Book — returns the current L2 snapshot for one coin (bid/ask levels with price, size, and order count). It is a point-in-time snapshot, not a history.
candleSnapshot — returns OHLCV candles with a limited lookback window. Useful for price, but it carries no depth information at all.

There is also a WebSocket at wss://api.hyperliquid.xyz/ws where you can subscribe to a live l2Book stream. This is the canonical way to start collecting depth — but only from the moment you connect, forward. If you want last year's data, the API cannot give it to you.

import requests

# Current L2 snapshot for BTC — NOT historical
resp = requests.post(
    "https://api.hyperliquid.xyz/info",
    json={"type": "l2Book", "coin": "BTC"},
)
book = resp.json()
bids, asks = book["levels"]  # each level: {"px", "sz", "n"}
print("Best bid:", bids[0], "  Best ask:", asks[0])

Verdict: great for live data and prototyping, useless for backfilling history. To build a 12-month dataset this way you would have to run a collector continuously for 12 months — and still handle reconnects, dropped messages, and clock drift.

Option 2 — Reconstruct From Raw Node Archives

Because Hyperliquid is a blockchain, the full order-by-order history exists in the chain's raw node data, which is published as a requester-pays S3 archive. In principle you can download it and replay every order placement, fill, and cancellation to rebuild the L2 book at any past timestamp.

In practice this is a serious data-engineering effort:

The archives are large (terabyte-scale) and you pay egress to pull them.
You must write a deterministic replay engine that maintains the book state per instrument and emits snapshots on a fixed cadence.
You have to handle aggregation, mid-price normalization, and gap filling before the data is usable for modeling.

Verdict: this is the only fully self-serve route to deep history, but it can consume weeks of engineering and meaningful cloud spend before you write a single line of research code. This is precisely the problem we built Imbalance Labs to remove.

Option 3 — Ready-to-Use Derived Datasets

The third option is to skip collection entirely and start from a cleaned, time-aligned dataset. Our historical orderbook datasets reconstruct Hyperliquid L2 depth, aggregate it into fixed bars, and normalize it into an analysis-ready schema. The 5-minute Standard release covers 24 instruments over 12+ months with 47 engineered columns per row: OHLCV, cumulative bid/ask volume at 10 depth levels, and the basis-point distance of each level from mid-price. See the full 47-column schema for the exact field list.

Because the output is a plain compressed CSV (or Parquet), loading a full instrument's history is a one-liner:

import pandas as pd

# Cleaned, time-aligned Hyperliquid L2 depth — 12+ months in one file
df = pd.read_parquet("btc_l2_depth_5m.parquet")

# Order-book imbalance at the top of book, in one line
df["obi_l1"] = df["bid_volume_level_1"] / (
    df["bid_volume_level_1"] + df["ask_volume_level_1"]
)

print(df[["timestamp_utc", "close_price", "obi_l1"]].tail())

From here you can go straight to research — computing imbalance signals, estimating slippage, or feeding 12 months of depth into an RL environment — instead of babysitting a WebSocket collector.

Which Option Should You Choose?

Approach	Gets History?	Effort	Best For
Native /info API + WS	Forward only	Medium (ongoing)	Live trading, prototyping
Raw node archives	Yes, full	Very high	Teams with data infra
Derived datasets	Yes, instant	None	Research & backtesting

If your edge is in research and you bill your time at anything close to a quant's rate, Option 3 is almost always the rational choice. The cost of a dataset is a rounding error against weeks of pipeline engineering.

Next Steps: From Data to Signal

Once you have clean Hyperliquid depth, the interesting work begins. Two companion reads:

Orderbook Imbalance Signals — turning bid/ask depth into predictive features.
Why OHLCV Models Fail — why depth-aware slippage modeling beats candle-only backtests.

Frequently Asked Questions

Does Hyperliquid have a historical orderbook data API?

Not directly. Hyperliquid's public /info endpoint returns the current L2 book snapshot (l2Book) and a limited window of OHLCV candles (candleSnapshot), but neither returns a deep, continuous history of Level 2 depth. To build a historical L2 dataset you must either record the WebSocket feed yourself over time, or reconstruct the book from raw node archives — both are non-trivial engineering projects.

Can I get free Hyperliquid historical data?

You can pull recent snapshots and candles for free from the public API, and you can capture the live WebSocket feed at no cost going forward. What you cannot get for free is clean, gap-free, multi-month historical L2 depth — that requires either months of self-collected data or the compute to replay raw node archives. Imbalance Labs offers a free 7-day sample of the processed dataset so you can evaluate the schema before buying.

What is the difference between L2 orderbook data and OHLCV candles?

OHLCV candles only describe price and traded volume over an interval. Level 2 (L2) orderbook data shows the resting bid and ask liquidity at multiple price levels at each point in time — the depth that determines real execution cost and slippage. OHLCV tells you what happened; L2 depth tells you how much capital was actually available to trade against.

Why use Hyperliquid data instead of a centralized exchange (CEX)?

Hyperliquid runs a fully on-chain central limit order book, so resting orders are real on-chain commitments rather than the zero-fee API quotes that fuel spoofing on many CEX venues. That makes the depth signal cleaner for microstructure research, slippage modeling, and statistical arbitrage.

Skip the Pipeline

Get cleaned, time-aligned Hyperliquid L2 orderbook depth across 24 instruments and 12+ months — ready for Pandas and DuckDB. Try the free 7-day sample first.

Browse Datasets →Download Free Sample

Full 47-column schema documentation available.