Every Kalshi trade. Every Polymarket fill. Every Manifold bet.
Reconciled.
A normalized cross-venue historical archive of prediction markets — on a single Postgres schema, with a daily reconciliation log against venue-published volume, and full retention of resolved and pruned markets. Built for quants who want to know exactly what their data does and doesn't cover.
Three venues. One schema. Honest about every gap.
Most data vendors hide the residual drift. We publish it. Below is the reconciliation state of the Phase-0 dataset — the same numbers a customer sees in the recon log.
- volume drift
- 0.0%
- markets sampled
- 50
- trade reconciliation
- 100% (50/50)
- source
- public REST
- perfect reconcile
- 30/40
- residual drift
- ~10% (active mkts)
- cause
- fills[] aggregation
- source
- public REST + cursor
- resolution coverage
- 50/50 closed mkts
- standard CTF
- Goldsky subgraph
- NegRisk wrapped
- Polygon RPC direct
- data-api cap busted
- 3,500 → unlimited
data-api
caps at offset 3,500 trades per market — high-volume markets are unrecoverable from REST alone.
We bust that ceiling by reading the Goldsky orderbook subgraph, plus a direct Polygon RPC
layer for NegRisk-wrapped markets the subgraph doesn't index.
Full write-up →
One canonical model across venues.
Six tables, every venue mapped to the same shape. Binary, categorical, on-chain, off-chain — all the same query.
-- markets are globally addressable: <venue>:<native_id>
SELECT m.market_id, m.venue_id, m.title,
m.volume_native, m.volume_unit,
m.resolved_at, m.closes_at
FROM markets m
WHERE m.venue_id = 'kalshi'
AND m.resolved_at IS NOT NULL
ORDER BY m.volume_native DESC LIMIT 10;
SQL on the Postgres canonical schema.
from predmarket import Predmarket
pm = Predmarket(api_key="YOUR_KEY")
# list any closed market across any venue
markets = pm.markets(venue="polymarket", limit=100)
# full trade history for one market
trades = pm.trades(market_id="polymarket:0xdd224...")
Python SDK — pip install predmarket (early access).
Survivorship-bias-free. Reconciled. Open about the failure modes.
Resolved + pruned markets retained
When a venue closes or archives a market, most scrapers lose it. We keep the last-known snapshot in a deletion ledger so backtests see the world as it actually was.
Daily drift report vs venue volume
Every coverage claim is backed by a row in recon_log. We publish the
drift, the threshold, and whether the gate passed. No claim without evidence.
Public docs of every quirk
Manifold sums abs(amount). Polymarket's data-api caps at offset 3,500.
NegRisk markets need Polygon RPC. We write the post-mortem you'd have to
discover yourself.
Get the sample. 5 markets. ~24KB. Three readers verified.
Parquet files for markets, outcomes, and trades — readable by pandas, polars, and duckdb. Free for evaluation. Email us with your use case and we'll send the link plus a sketch of what production access would cost for your shape of work.