Skip to content

Add autonomous ML experiment support#521

Draft
Saba9 wants to merge 7 commits intomainfrom
saba/auto-ml
Draft

Add autonomous ML experiment support#521
Saba9 wants to merge 7 commits intomainfrom
saba/auto-ml

Conversation

@Saba9
Copy link
Copy Markdown
Collaborator

@Saba9 Saba9 commented Apr 20, 2026

Summary

  • 3 new CLI commands for agent workflows: trackio best, trackio compare, trackio summary — collapse multi-step agent workflows into single calls with --json output
  • Run status tracking — runs automatically marked as running/finished/failed, exposed via CLI and Python API
  • Structured alert datatrackio.alert(data={...}) for machine-readable payloads agents can parse without text extraction
  • Metric watcherstrackio.watch() + trackio.should_stop() for automatic NaN/spike/stagnation/threshold detection
  • Python API completenessrun.metrics(), run.history(), run.summary, run.status on Api.Run
  • Test harness — synthetic training simulator + agent test runner with 5 experiment types (LR search, architecture search, failure recovery, long monitoring, multi-objective)

Test plan

  • All 14 e2e-local tests pass
  • ruff check and ruff format pass clean
  • Manual verification: trackio best, trackio compare, trackio summary with JSON and human-readable output
  • Manual verification: run status tracking (running → finished, running → failed on crash)
  • Manual verification: structured alert data round-trips through storage and CLI
  • Manual verification: metric watchers fire alerts and trigger should_stop()
  • Run agent_runner.py end-to-end with all 5 experiments

🤖 Generated with Claude Code

Saba9 and others added 4 commits April 20, 2026 10:45
…ers, run status, structured alerts

Adds agent-facing query APIs and tooling to make Trackio the definitive tracker
for autonomous ML experiments driven by AI coding agents.

New CLI commands:
- `trackio best` — rank runs by metric, return winner + leaderboard
- `trackio compare` — side-by-side run comparison across metrics
- `trackio summary` — full experiment overview with status/configs/metrics

New features:
- Run status tracking (running/finished/failed) with automatic lifecycle mgmt
- Structured alert data (`data={}` param on `trackio.alert()`)
- Metric watchers (`trackio.watch()`) for auto NaN/spike/stagnation detection
- `trackio.should_stop()` for training loop early stopping
- Python API: `run.metrics()`, `run.history()`, `run.summary`, `run.status`

Test infrastructure:
- Synthetic training simulator (no ML deps, runs in seconds)
- Agent test runner with 5 experiment types

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve merge conflicts integrating autonomous ML features with main branch
changes (run_id support, RemoteClient, query command, OAuth, error handling).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gradio-pr-bot
Copy link
Copy Markdown
Contributor

gradio-pr-bot commented Apr 20, 2026

🪼 branch checks and previews

Name Status URL
🦄 Changes detected! Details

@gradio-pr-bot
Copy link
Copy Markdown
Contributor

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
trackio minor

  • Add autonomous ML experiment support

‼️ Changeset not approved. Ensure the version bump is appropriate for all packages before approving.

  • Maintainers can approve the changeset by checking this checkbox.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

HuggingFaceDocBuilderDev commented Apr 20, 2026

🪼 branch checks and previews

Name Status URL
Spaces ready! Spaces preview

Install Trackio from this PR (includes built frontend)

pip install "https://huggingface.co/buckets/trackio/trackio-wheels/resolve/2283983183942ecd90479c36298cb09d951f0348/trackio-0.24.0-py3-none-any.whl"

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Saba9 and others added 2 commits April 20, 2026 11:50
Fixes:
- Fix 1: Run status "failed" no longer overwritten by "finished" —
  finish() now accepts a status parameter, _cleanup_current_run passes
  status="failed" directly
- Fix 2: Watcher patience supports both min and max mode via new
  mode parameter (was hardcoded to minimization only)
- Fix 3: Replace dead --minimize/--maximize flags with --direction
  {min,max} on trackio best
- Fix 5: Watcher _values now uses deque(maxlen=window) to bound memory;
  alert dedup via per-condition flags that reset on return to normal
- Fix 6: Warn if watchers exist when init() clears them; docstring
  documents ordering requirement
- Fix 7: Drop Api.Run.summary cache — recompute on each access
- Fix 9: set_run_status/get_run_status now accept and use run_id,
  INSERT handles run_id NOT NULL column in new schema
- Fix 12: Remove unused enumerate variable in agent_runner

Tests:
- test_watchers.py: 18 tests covering nan, spike, max/min threshold,
  dedup, patience min/max mode, window bounds, manager propagation
- test_run_status.py: 6 tests covering running→finished, failed status,
  idempotent finish, Api.Run.status, multi-run status
- test_cli_agent_commands.py: 10 tests covering best/compare/summary
  in JSON and human-readable modes, error cases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants