Conversation
…ers, run status, structured alerts
Adds agent-facing query APIs and tooling to make Trackio the definitive tracker
for autonomous ML experiments driven by AI coding agents.
New CLI commands:
- `trackio best` — rank runs by metric, return winner + leaderboard
- `trackio compare` — side-by-side run comparison across metrics
- `trackio summary` — full experiment overview with status/configs/metrics
New features:
- Run status tracking (running/finished/failed) with automatic lifecycle mgmt
- Structured alert data (`data={}` param on `trackio.alert()`)
- Metric watchers (`trackio.watch()`) for auto NaN/spike/stagnation detection
- `trackio.should_stop()` for training loop early stopping
- Python API: `run.metrics()`, `run.history()`, `run.summary`, `run.status`
Test infrastructure:
- Synthetic training simulator (no ML deps, runs in seconds)
- Agent test runner with 5 experiment types
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve merge conflicts integrating autonomous ML features with main branch changes (run_id support, RemoteClient, query command, OAuth, error handling). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
🪼 branch checks and previews
|
Contributor
🦄 change detectedThis Pull Request includes changes to the following packages.
|
🪼 branch checks and previews
Install Trackio from this PR (includes built frontend) pip install "https://huggingface.co/buckets/trackio/trackio-wheels/resolve/2283983183942ecd90479c36298cb09d951f0348/trackio-0.24.0-py3-none-any.whl" |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Fixes:
- Fix 1: Run status "failed" no longer overwritten by "finished" —
finish() now accepts a status parameter, _cleanup_current_run passes
status="failed" directly
- Fix 2: Watcher patience supports both min and max mode via new
mode parameter (was hardcoded to minimization only)
- Fix 3: Replace dead --minimize/--maximize flags with --direction
{min,max} on trackio best
- Fix 5: Watcher _values now uses deque(maxlen=window) to bound memory;
alert dedup via per-condition flags that reset on return to normal
- Fix 6: Warn if watchers exist when init() clears them; docstring
documents ordering requirement
- Fix 7: Drop Api.Run.summary cache — recompute on each access
- Fix 9: set_run_status/get_run_status now accept and use run_id,
INSERT handles run_id NOT NULL column in new schema
- Fix 12: Remove unused enumerate variable in agent_runner
Tests:
- test_watchers.py: 18 tests covering nan, spike, max/min threshold,
dedup, patience min/max mode, window bounds, manager propagation
- test_run_status.py: 6 tests covering running→finished, failed status,
idempotent finish, Api.Run.status, multi-run status
- test_cli_agent_commands.py: 10 tests covering best/compare/summary
in JSON and human-readable modes, error cases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
trackio best,trackio compare,trackio summary— collapse multi-step agent workflows into single calls with--jsonoutputrunning/finished/failed, exposed via CLI and Python APItrackio.alert(data={...})for machine-readable payloads agents can parse without text extractiontrackio.watch()+trackio.should_stop()for automatic NaN/spike/stagnation/threshold detectionrun.metrics(),run.history(),run.summary,run.statusonApi.RunTest plan
ruff checkandruff formatpass cleantrackio best,trackio compare,trackio summarywith JSON and human-readable outputshould_stop()🤖 Generated with Claude Code