HISTORICAL L2 ORDERBOOK DATA — HYPERLIQUID DEX — 24 INSTRUMENTS — 12+ MONTHS

DEX Orderbook Data
That Quants Actually Need.

Historical Level 2 Orderbook Depth Data. Cleaned, Normalized, Time-Aligned. Noise-Free Microstructure.

Ready-to-use algorithmic trading datasets for institutional backtesting and stat-arb.
Download ready. Compressed CSV. One-time purchase.

BTC-USDTETH-USDTSOL-USDTBNB-USDTXRP-USDTDOGE-USDTADA-USDTAVAX-USDTLINK-USDTDOT-USDTNEAR-USDTSUI-USDTAPT-USDTARB-USDTOP-USDTINJ-USDTSEI-USDTTIA-USDTWIF-USDTFIL-USDTLTC-USDTETC-USDTATOM-USDTXLM-USDTBTC-USDTETH-USDTSOL-USDTBNB-USDTXRP-USDTDOGE-USDTADA-USDTAVAX-USDTLINK-USDTDOT-USDTNEAR-USDTSUI-USDTAPT-USDTARB-USDTOP-USDTINJ-USDTSEI-USDTTIA-USDTWIF-USDTFIL-USDTLTC-USDTETC-USDTATOM-USDTXLM-USDT
The DEX Edge

Why DEX Data? Escaping the CEX Noise.

The CEX Problem

CEX Orderbooks Are Full of Noise

  • CEX Spoofing: Traditional orderbooks (like Binance) are filled with HFT spoofing and microstructural noise
  • Months of Data Engineering: Building L2 depth pipelines from raw decentralized exchange archives requires custom parsers, timestamp alignment, and massive compute — before you even start your research
  • Zero-fee API spam creates phantom liquidity that disappears at execution
  • Raw L2/L3 feeds produce terabytes of microstructural noise
The DEX Advantage

Cleaned and Normalized L2 Data, Ready to Use

  • True Market Intent: DEX orderbooks from top L1 Perp chains reflect genuine liquidity and true institutional positioning, free from zero-fee API spam
  • Ready-to-Use CSV Orderbook Datasets: We've done the heavy data engineering — our proprietary pipeline handles the raw exchange data, aligns timestamps, normalizes depth, and packages it for instant Pandas/DuckDB ingestion
  • Time-aligned orderbook bars — no clock drift, no asynchrony. High-fidelity 5-minute resolution for noise-free microstructure research
  • 10-level depth — granular bid/ask profiles, cumulative volumes, and distance from mid-price
  • Pre-computed Orderbook Imbalance & Stat-Arb Ready — derived features and the missing piece for your CEX vs. DEX statistical arbitrage models

Hyperliquid Orderbook Data

Liquidity profiles are sourced and derived from Hyperliquid — the leading L1 Perpetual Decentralized Exchange — capturing the most active on-chain derivatives orderbook flow.

24
Instruments
12+
Months History
~2.3M
Total Bars
47
Columns Per Row
Schema

Every Row. Every Field. Documented.

Each row = one 5-minute aggregated perpetual futures orderbook snapshot — ready for crypto backtesting and deep learning

ColumnTypeDescription
timestamp_utcDateTimeISO 8601 UTC timestamp
instrument_symbolStringTrading pair (e.g., BTC-USDT)
open_priceFloatMid-price at bar open
high_priceFloatHighest mid-price in bar
low_priceFloatLowest mid-price in bar
close_priceFloatMid-price at bar close
interval_traded_volumeFloatTaker flow volume proxy
bid_volume_level_1..10FloatCumulative passive bid volume
ask_volume_level_1..10FloatCumulative passive ask volume
bid_distance_level_1..10FloatDistance from mid-price (bps)
ask_distance_level_1..10FloatDistance from mid-price (bps)
Custom Configurations → Higher resolution and deeper depth profiles available on request.
Use Cases

Built For Quantitative Minds

📐

Quant Researchers

Build order flow features for LSTM, Transformer, and Reinforcement Learning (RL) trading environments without months of data engineering.

Execution Algos

Backtest TWAP, VWAP, and iceberg strategies against real depth profiles with institutional backtesting data. Estimate slippage pre-deployment.

🏦

Market Makers

Study bid-ask dynamics, quote density, and liquidity provision patterns across 24 instruments.

🎓

Academics

Institutional-quality quantitative research crypto data without exchange partnerships or Bloomberg terminals.

Pricing

One-Time Purchase. No Subscriptions.

Choose your resolution. Delivered instantly after payment. Compressed CSV.

5m Standard5-min / 10-level depth / 47 columns
Single Instrument
  • 1 of 24 instruments · ~96K bars
  • Personal license · Non-commercial
$99
Full Dataset — 24 Instruments
  • All 24 instruments · ~2.3M bars
  • Team (10 users) · Commercial use
  • Live strategy feeding · Priority support
$999
NEW
1m Pro1-min / 30-level depth / 121 columns
Single Instrument
  • 1 of 24 instruments · ~471K bars
  • Personal license · Non-commercial
Full Bundle — 24 Instruments
  • All 24 instruments · ~11.3M bars
  • ~7.4 GB total depth data
  • Team (10 users) · Commercial use
$1,499
Buy

Need redistribution rights, custom data, or API access?

Contact us for Enterprise pricing ($5,000+)
Dataset Coverage

24 Instruments. March 2025 – February 2026.

Capturing the 2025/2026 crypto market transition — bull runs, corrections, and regime changes.

BTC
~96K bars
ETH
~96K bars
SOL
~96K bars
BNB
~96K bars
XRP
~96K bars
DOGE
~96K bars
ADA
~96K bars
AVAX
~96K bars
LINK
~96K bars
DOT
~96K bars
NEAR
~96K bars
SUI
~96K bars
APT
~96K bars
ARB
~96K bars
OP
~96K bars
INJ
~96K bars
SEI
~96K bars
TIA
~96K bars
WIF
~96K bars
FIL
~96K bars
LTC
~96K bars
ETC
~96K bars
ATOM
~96K bars
XLM
~96K bars
Time Range
March 2025 → Feb 2026
~12 months continuous coverage
Per Instrument
~96,000 bars
5-minute intervals, 47 columns each
Total Dataset
~2.3 Million rows
10-level depth × bid & ask sides

Need Different Data?

We can generate orderbook depth data for any cryptocurrency, at any candle interval (1m, 5m, 15m, 1h, etc.), covering any time period. Custom depth levels (up to 30) and exchange selection available.

Request Custom Dataset →
Free Sample

Try Before You Buy

7-day sample of all 24 instruments. 47 columns. 5-minute resolution.

Just want a quick look? Download BTC sample directly — no email, no spam.

↓ BTC 7-Day Sample (CSV, 739 KB)

Want all 24 instruments? Enter your email above for the full sample pack.

Why Orderbook Data

Market Microstructure Data: OHLCV Is Not Enough.

Traditional candlestick data tells you what happened. Orderbook depth data tells you why it happened — and what's about to happen next.

🔬

See the Invisible Forces

Every price candle hides a storm of bid-ask dynamics. Large institutional orders, spoofing walls, and liquidity vacuums shape price action — but are invisible in OHLCV data. Our 10-level depth profiles expose the micro-structure behind every 5-minute bar: cumulative passive liquidity on both sides, measured in actual volume and basis-point distance from mid-price.

🧠

Deep Learning & ML-Ready Feature Set

47 pre-computed columns per row means you skip months of data engineering. Feed directly into LSTM networks, Transformer architectures, reinforcement learning environments, or gradient-boosted models. The bid-ask imbalance ratio — widely cited in academic microstructure literature — is computable in one line from our schema. No WebSocket parsing, no clock-drift alignment, no GPU-intensive normalization.

⚖️

Legally Clean IP

Raw Level 2 orderbook data from centralized exchanges is often restricted by Terms of Service from redistribution. Our datasets are classified as Derived Data — aggregated, transformed, and mathematically computed from raw inputs. The original tick-level snapshots are not recoverable or reverse-engineerable. You get institutional-quality depth intelligence without the legal risk.

From Data To Alpha

Your Statistical Arbitrage Research Pipeline, Accelerated

📂01

Load

Import CSV.gz directly into Pandas, DuckDB, or Spark. No parsing, no cleaning needed.

⚙️02

Engineer

Compute bid-ask imbalance, depth slope, liquidity concentration, and 100+ features from 47 raw columns.

🤖03

Train

Feed into RL environments, LSTM/Transformers, or XGBoost. 2.3M rows = 12 months of replay buffer.

🚀04

Deploy

Backtest slippage-aware strategies against real depth. Validate before risking capital on live markets.

Our Story

Built by a Quant, for Quants.

How on-chain orderbook data changed everything.

Hi. I'm the creator of Imbalance Labs. Outside of my day job, I spend my time researching financial markets using Reinforcement Learning algorithms.

A while back, I hit a wall. I was feeding my RL agents standard OHLCV candle data. The math checked out, but my models were blind to what actually matters: real liquidity, spread, and slippage. I quickly realized that in the context of AI-driven trading, traditional candles are nothing more than a liquidity illusion.

RL models are ruthless. If you don't give them market intent context, they make naïve decisions. I knew that to level up, I needed to give my trading bot full visibility into market microstructure — historical Level 2 orderbook data.

That's when the real engineering nightmare began. Getting clean, historical L2 depth data from leading DEX exchanges is borderline impossible for most researchers. Instead of training models, I spent weeks building data infrastructure — a proprietary pipeline that ingests raw exchange data, cleans it, normalizes the depth profiles, and compresses it into analysis-ready formats.

When I finally plugged the finished dataset into my bot's training environment, the difference inlearning quality and risk management was massive.

That's when it hit me: if I, as a hobbyist, needed this data badly enough to spend weeks building infrastructure to process it — professional analysts, quants, and researchers are certainly fighting the same battle.

That's how Imbalance Labs was born. I did the worst, most tedious infrastructure work so you don't have to. Download the data and jump straight to what matters — training models and testing strategies.

FAQ

Frequently Asked Questions

What is orderbook depth data and how is it different from OHLCV?

+
OHLCV (Open-High-Low-Close-Volume) candles only show price and volume at a surface level. Orderbook depth data reveals the actual bid and ask liquidity sitting at multiple price levels — showing you how much buying and selling pressure exists and how far from the market price. Our dataset provides 10 levels of depth on both sides, measured in cumulative volume and basis-point distance from mid-price. This is the raw material that institutional traders and quantitative researchers use to detect order flow, predict short-term price movements, and model realistic execution costs.

Which instruments and timeframe does the dataset cover?

+
The current release covers 24 major crypto perpetual futures sourced from the leading L1 Perpetual Decentralized Exchange: BTC, ETH, SOL, BNB, XRP, DOGE, ADA, AVAX, LINK, DOT, NEAR, SUI, OP, ARB, SEI, TIA, INJ, APT, FIL, LTC, ETC, WIF, XLM, and ATOM. All data is aggregated into 5-minute bars spanning from March 2025 to February 2026 — approximately 12 months of continuous coverage. Each instrument has ~96,000 bars with 47 derived columns per row, including OHLCV, bid/ask volumes, and bid/ask distances at 10 depth levels.

Can I use this data for machine learning and deep learning?

+
Absolutely. The dataset is specifically designed for ML workflows. The 47-column schema feeds directly into LSTM networks, Transformer architectures, reinforcement learning environments (like Gymnasium/Stable-Baselines3), and gradient-boosted models (XGBoost, LightGBM). Common derived features include bid-ask imbalance ratios, depth-weighted pressure scores, liquidity concentration metrics, and spread dynamics — all computable in a few lines of Python from our raw columns.

Can you create custom datasets with different instruments, timeframes, or depth levels?

+
Yes. We can generate orderbook depth data for any cryptocurrency available on major derivatives DEXs and CEXs, at any candle interval (1-minute, 5-minute, 15-minute, 1-hour, etc.), covering any historical time period. We also support extended depth up to 30 levels. Contact us at imbalancelabs@gmail.com with your requirements for a custom quote.

Is this data legally safe to use?

+
Yes. Our datasets are classified as Derived Data — an Aggregated Liquidity and Orderbook Depth Index. All raw order book snapshots have been aggregated across time intervals, normalized, and transformed through statistical computations. The original tick-level data is not recoverable from our product. This classification places it outside the scope of most exchange data redistribution restrictions.

What format is the data delivered in?

+
All datasets are delivered as compressed CSV files (.csv.gz), which can be loaded directly by Python pandas, R, DuckDB, Apache Spark, and most data analysis tools. Each file is named by instrument (e.g., BTC_5m_depth10_derived.csv.gz). The full dataset ZIP containing all 24 instruments is approximately 300 MB.

Legal Disclaimer

The datasets distributed by Imbalance Labs constitute an Aggregated Liquidity and Orderbook Depth Index — a proprietary, mathematically derived analytical product. All raw order book data has been independently collected, aggregated across time intervals, normalized to mid-price reference frames, and transformed through statistical computations (cumulative volume aggregation, basis-point distance normalization).

This product is classified as Derived Data under standard market data licensing frameworks. It does not constitute, reproduce, or redistribute any raw, unmodified exchange data stream. The original tick-level order book snapshots are not included, recoverable, or reverse-engineerable from this dataset.

Imbalance Labs is not affiliated with, endorsed by, or officially connected to any cryptocurrency exchange or decentralized protocol. All exchange names and trademarks are the property of their respective owners.

This data is provided for research, backtesting, and analytical purposes only. It does not constitute financial advice, trading signals, or investment recommendations. Users assume full responsibility for any trading or investment decisions made using this data.