Skip to content

fix(polymarket_polygon): is_taker_side flag + apply filter in ohlcv to fix volume double-counting#9684

Merged
jeff-dude merged 5 commits into
mainfrom
fix/polymarket-taker-side-dedup
May 19, 2026
Merged

fix(polymarket_polygon): is_taker_side flag + apply filter in ohlcv to fix volume double-counting#9684
jeff-dude merged 5 commits into
mainfrom
fix/polymarket-taker-side-dedup

Conversation

@los-xyz
Copy link
Copy Markdown
Contributor

@los-xyz los-xyz commented May 19, 2026

Summary

Polymarket's CLOB contracts emit two OrderFilled events per match (one for the maker leg, one for the taker leg), each carrying the full fill amount. Naive SUM(amount) over polymarket_polygon.market_trades therefore over-reports volume by ~2x. This is the double-counting documented by Paradigm in Dec 2025.

This PR:

  • market_trades_raw + market_trades: adds a new column is_taker_side BOOLEAN. TRUE on the taker leg of each match. All rows preserved (no row-count change), so users who want the raw OrderFilled stream still have it.
  • ohlcv_hourly: filters to is_taker_side = true at the base CTE. volume_usd, volume_contracts, trade_count, vwap halve to Polymarket-canonical values. open / high / low / close unchanged (price is per-trade, symmetric across both legs).

Predicate: taker IN (V1 CTF 0x4bfb...82e, V1 NegRisk 0xc5d5...80a, V2 CTF 0xe111...996b, V2 NegRisk 0xe222...0f59).

Validation

Cell-by-cell match against the Polymarket-authored reference query 6899861 across 17 months (Jan 2025 to May 2026). Deltas are floating-point noise (1e-7 range). Sample:

Month Reference query (Polymarket) New is_taker_side filter Delta
2025-01 $711,931,402.49 $711,931,402.49 0
2025-12 $2,377,961,354.40 $2,377,961,354.40 4.77e-7
2026-03 $4,984,078,339.64 $4,984,078,339.64 -9.54e-7
2026-04 $4,212,805,014.74 $4,212,805,014.74 -1.43e-6
2026-04 notional (shares) 9.0086B 9.0086B 0

The 9.0086B April 2026 notional matches the publicly-cited "Polymarket posted $9 billion in notional terms" figure exactly.

Verification query: 7538562.

Downstream impact

Spell / consumer Effect
polymarket_polygon.market_trades New column. Existing columns unchanged. Row count unchanged.
polymarket_polygon.ohlcv_hourly volume_usd, volume_contracts, trade_count halve. Prices unchanged.
prediction_markets.ohlcv_hourly (curated-data) Auto-corrects on rebuild. Same shape change.
prediction_markets.trades (curated-data) Will be updated in a follow-up PR to surface is_taker_side. All Kalshi rows will carry is_taker_side = TRUE (Kalshi has no double-counting) so the filter can be applied uniformly across venues.
External dashboards reading market_trades.amount without filter Continue to show 2x. Methodology change worth a comms note (Paradigm + Allium + DefiLlama already moved to this convention).

Test plan

  • dbt parse --warn-error-options error:all clean locally (matches CI's exact flag set).
  • dbt compile clean for market_trades_raw, market_trades, ohlcv_hourly.
  • Side-by-side match against Polymarket's reference methodology query 6899861 (17 months, see table above).
  • Notional shares cross-check against publicly-stated April 2026 number.
  • CI green on this PR.

🤖 Generated with Claude Code

Polymarket's CLOB contracts emit two OrderFilled events per match (maker
leg + taker leg), each carrying the full fill amount. A naive
SUM(amount) over market_trades double-counts volume by ~2x, which is
how every consumer of these spells has been reading volume to date.

This change:

1. market_trades_raw / market_trades: add is_taker_side BOOLEAN. TRUE on
   the taker leg of each match. Predicate: taker IN (V1 CTF
   0x4bfb...82e, V1 NegRisk 0xc5d5...80a, V2 CTF 0xe111...996b, V2
   NegRisk 0xe222...0f59). All rows preserved so users who want the
   raw event stream still have it.

2. ohlcv_hourly: filter to is_taker_side=true at the base CTE.
   volume_usd, volume_contracts, trade_count halve on Polymarket;
   open/high/low/close/vwap unchanged (price is per-trade, symmetric).
   This brings the OHLCV table in line with Polymarket's own published
   methodology.

Validation: cell-by-cell match against the Polymarket-authored
reference query 6899861 across 17 months (Jan 2025 to May 2026). Deltas
are in 1e-7 floating-point range. Notional shares for Apr 2026 = 9.0086B,
matches the publicly-cited "$9 billion notional" figure exactly.

Methodology source: https://www.paradigm.xyz/2025/12/polymarket-volume-is-being-double-counted

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@cursor
Copy link
Copy Markdown

cursor Bot commented May 19, 2026

PR Summary

Medium Risk
Changes core Polymarket volume/trade-count aggregation by filtering to a new is_taker_side flag; downstream dashboards/metrics may shift (~50%) even though trade rows are preserved.

Overview
Fixes Polymarket CLOB volume double-counting by introducing a new boolean is_taker_side on market_trades_raw/market_trades, computed via taker IN (known exchange contract addresses) to identify the canonical taker-leg event per match.

Updates ohlcv_hourly to filter its base trade set to is_taker_side, bringing volume_usd, volume_contracts, and trade_count in line with one-sided (non-duplicated) methodology while leaving price OHLC logic unchanged. Documentation in _schema.yml is updated to describe the new column and recommended filtering.

Reviewed by Cursor Bugbot for commit 119aaa1. Configure here.

@github-actions github-actions Bot marked this pull request as draft May 19, 2026 13:39
@github-actions github-actions Bot added WIP work in progress dbt: daily covers the Daily dbt subproject labels May 19, 2026
Drop the methodology comment block in market_trades_raw and the inline
comment on the ohlcv_hourly filter. PR description carries the WHY.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@los-xyz los-xyz marked this pull request as ready for review May 19, 2026 15:39
@github-actions github-actions Bot added ready-for-review this PR development is complete, please review and removed WIP work in progress labels May 19, 2026
los-xyz and others added 3 commits May 19, 2026 18:29
…untime

The CI for this PR cancels at the 90-minute timeout because the temp
schema has no prior state, so is_incremental() is false and the modified
models scan the full historical OrderFilled history.

Add an else branch on each is_incremental() guard that floors the initial
build at the last 7 days. Prod incremental runs are unaffected (the
is_incremental() branch still uses the existing DBT_ENV_INCREMENTAL_TIME
predicate).

- market_trades_raw.sql: 3 union legs (V1 CTF, V1 NegRisk, V2)
- ohlcv_hourly.sql: base CTE and final SELECT

Verified locally with `dbt --warn-error compile`.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jeff-dude jeff-dude self-assigned this May 19, 2026
@jeff-dude jeff-dude added ready-for-merging and removed ready-for-review this PR development is complete, please review labels May 19, 2026
@jeff-dude jeff-dude merged commit 79b6670 into main May 19, 2026
6 of 7 checks passed
@jeff-dude jeff-dude deleted the fix/polymarket-taker-side-dedup branch May 19, 2026 20:20
@github-actions github-actions Bot locked and limited conversation to collaborators May 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

dbt: daily covers the Daily dbt subproject ready-for-merging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants