Skip to content

Generate TransitMatters travel-time benchmarks for rapid transit#85

Open
devinmatte wants to merge 2 commits intomainfrom
tm-travel-time-benchmarks
Open

Generate TransitMatters travel-time benchmarks for rapid transit#85
devinmatte wants to merge 2 commits intomainfrom
tm-travel-time-benchmarks

Conversation

@devinmatte
Copy link
Copy Markdown
Member

Summary

  • Adds chalicelib/benchmarks/tm_benchmarks.py — computes TM-defined travel-time benchmarks per (from, to) stop pair for all rapid-transit lines, writing to s3://tm-mbta-performance/Benchmarks-tm/traveltimes/{Color}.json.
  • Consumes the existing SlowZones archive (per-day p50 travel times + dwells, back to 2016) as the source of truth — no new LAMP reads, no new raw-event processing.
  • Monthly Chalice cron (historical p50 barely moves week-to-week) with a new IAM policy scoped to SlowZones/* read + Benchmarks-tm/* write.

How it works

For each rapid-transit line:

  1. Load every adjacent stop-pair traveltime CSV and dwell CSV.
  2. Build a directed graph from adjacency filenames (handles branch splits like JFK → Ashmont/Braintree and the Green Line's shared trunk naturally).
  3. DFS forward from each stop, accumulating a per-day Series of move(current→next) + dwell(current). Origin's dwell is skipped (the dashboard measures travel time from origin departure to destination arrival, which excludes origin dwell but includes every intermediate dwell).
  4. At each reachable destination, take the median of the aligned per-day cumulative sums and ceil to 30s.
  5. Pairs with fewer than 365 aligned service-days fall back to the MBTA benchmark on the dashboard side (no entry written).
  6. When multiple paths reach the same stop (Green Line trunk), keep the minimum benchmark.

Running locally

cd mbta-performance
uv run python -m chalicelib.benchmarks.tm_benchmarks

Reads the SlowZones archive and writes the output JSONs directly. Cheaper than invoking the Lambda; intended for ad-hoc refreshes.

Cost impact

Negligible. Monthly lambda run over ~175 small CSVs per line, ~5–10 minutes total. One new IAM role, one new cron, five small JSON objects written to the existing bucket.

Test plan

  • Run locally and eyeball Red / Orange / Blue / Mattapan / Green output JSONs for sanity (Davis→Porter ≈ 2m, Davis→Kendall ≈ 9m, etc.)
  • Verify against a known slow-zone pair — TM benchmark should sit below the current MBTA scheduled time.
  • Deploy to beta, confirm cron fires on 1st of next month.

Adds a new chalicelib/benchmarks module that builds TM-defined travel-time
benchmarks from the existing SlowZones archive (per-day p50 travel times and
dwells, back to 2016-01-15). A directed graph built from adjacent stop-pair
filenames is walked via DFS from each stop, summing per-day p50 move + dwell
series along each path. Intermediate dwells are included; origin dwell is not
(matches how t-performance-dash measures travel time: departure at origin to
arrival at destination).

The TM benchmark per pair is the median of the aligned per-day sums, ceil'd to
30s. Pairs with fewer than 365 aligned service-days are skipped. Output is one
small JSON per rapid-transit line at
s3://tm-mbta-performance/Benchmarks-tm/traveltimes/{Color}.json.

Scheduled monthly (historical p50 barely moves week-to-week). Can also be run
locally with AWS creds: `uv run python -m chalicelib.benchmarks.tm_benchmarks`.
Comment thread mbta-performance/.chalice/config.json Outdated
Local run across all 5 rapid-transit lines finishes in ~1m 45s with modest
memory. 900s / 4096MB was overprovisioned.
@devinmatte devinmatte marked this pull request as ready for review April 19, 2026 22:23
@devinmatte devinmatte requested review from a team, ankoure and hamima-halim as code owners April 19, 2026 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant