Skip to content

VectorSpaceLab/Auto-ML-Skills

Repository files navigation

Auto-ML-Skills

Distilling ML repositories and research papers into reusable Agent Skills

Skill Library: 170 repo skills DisCo CLI v0.0.4 Meta Skills Contributing Guide License: Apache 2.0

English | 简体中文

Auto-ML-Skills helps coding agents stop treating ML repositories and papers as anonymous context. It distills source-grounded operating knowledge from software repositories and research papers into compact Agent Skills, then gives DisCo, a TypeScript CLI, the workflows needed to create, verify, refresh, extend, import, and maintain those skills. The result is a runtime skill library that can guide agents through real ML software and paper-derived methods with less API guessing, fewer wasted tokens, and stronger evidence discipline.

At the current checkout, the public library contains 170 repository-specific runtime skills plus a router skill for progressive selection. The same repo also includes DisCo source code, copyable meta skills, architecture notes, and the Paper2Skills Distiller workflow for turning research papers into modular skills.

🧭 Table Of Contents

📣 News

  • 2026-06-28: Initial release of Auto-ML-Skills, including the public runtime skill library, the DisCo CLI for repo-skill and paper-to-skill workflows, and the companion meta skills for bringing DisCo workflows into agents such as Codex and Claude Code.

💡 Why Auto-ML-Skills

Modern coding agents can already write useful machine-learning code, but they often struggle when the correct action depends on a living repository rather than a generic package memory.

  • Repo-specific APIs are easy to misuse. ML libraries hide important behavior in configs, launchers, examples, registry systems, data formats, and version-specific conventions.
  • Package choice is itself a task. LLM serving, RAG, bio/chem, vision, MLOps, evaluation, RL, and scientific Python stacks overlap heavily; agents need a routing map before they can pick the right tool.
  • Fresh source evidence matters. The safest instruction often comes from today's checkout, package metadata, tests, examples, and upstream commit rather than a stale public-memory summary.
  • Papers need operational distillation. A paper's reusable knowledge is often split across method sections, equations, ablations, data assumptions, and optional implementation repos; agents need that knowledge converted into testable module-level skills before recovery work is credible.
  • Trial and error is expensive. Unstructured exploration can burn turns, downloads, GPU jobs, and debugging time before the agent reaches the workflow the repository already documents.

Auto-ML-Skills addresses this by making repository knowledge installable, verifiable, and routable. A skill is not a broad tutorial; it is a compact operating map that tells an agent how to work with a specific project, when to load deeper references, and which mistakes to avoid. Paper-derived skills apply the same idea to research methods: they turn a paper into reusable modules that can be validated, invoked, and refined during bounded recovery runs.

🧰 What Is Included

Layer Location What it provides
Runtime skill library repo-skills/ 170 repository-specific ML, LLM, agent, RAG, bio/chem, vision, MLOps, RL, evaluation, and scientific Python skills, plus repo-skills-router for selection.
DisCo CLI source src/ The @auto-ml-skills/disco TypeScript workspace, exposing the disco command and bundled workflows for repo-skill creation, verification, import, refresh, extension, and Paper2Skills distillation.
Workflow meta skills meta-skills/ Lightweight package/repo and paper-to-skill workflows that can be copied into Codex or Claude Code when you do not need the full CLI source.
Documentation docs/ Architecture notes and the public imported-skill catalog with upstream repositories, package versions, commits, and coverage summaries.

Which Part Should You Use?

  • Use the skill library when you want an agent to use existing ML repo knowledge.
    • Copy repo-skills/ into DisCo's managed library at ~/.disco/agent/skills/.
    • Then import selected or all repo skills into Codex, Claude Code, or another target agent.
  • Use the DisCo CLI when you want to create or maintain skills.
    • Create, verify, refresh, extend, and import repo skills.
    • Distill papers into reusable module-level skills with disco --source paper.
    • Keep routing metadata and repo-skills-router updated for imported skills.
  • Use workflow meta skills when another agent should run the workflows without the full CLI source.
    • Copy meta-skills/ into the target agent's skills/ directory.
    • Use this path for portable repo-skill and paper-to-skill workflows in Codex, Claude Code, or similar agents.

🗂️ Library Coverage

The included skill catalog is maintained in docs/imported-repo-skills.md. It records each skill's upstream repository, update date, package version information, source commit, and intended workflow coverage.

Area Examples from the included library
ML infrastructure and training Dask, DGL, PyTorch Lightning, Optuna, PyTorch Geometric
Data preparation and evaluation MTEB, LM Evaluation Harness, Datasets, Evaluate, OpenCompass, Pillow, TorchVision
LLM training, fine-tuning, and serving Axolotl, DeepSpeed, Transformers, PEFT, NeMo, vLLM, SGLang, Unsloth, TRL
Agents and agentic workflows Browser Use, CAMEL, CrewAI, OpenHands, MetaGPT, LangFlow, LangChain, LangGraph, AutoGen, OpenAI Agents SDK
RAG and document AI Haystack, Docling, LightRAG, RAGFlow, Khoj, Kotaemon, GraphRAG, LlamaIndex, Qdrant Client, Unstructured
RL and distributed AI systems Gymnasium, Ray, Acme, AgileRL, Stable-Baselines3, CleanRL, PettingZoo, Tianshou
Bio, chemistry, vision, and scientific Python AlphaFold, AlphaFold3, OmegaFold, OpenFE, DeepMD-kit, Scanpy, MONAI, MMCV, MMDetection, ComfyUI
MLOps and orchestration Airflow, BentoML, Dagster, Feast, Great Expectations, MLflow, ZenML, ClearML, Kedro, Snakemake, W&B

⚙️ Installation

The minimal setup has two steps:

  1. Install the disco CLI.
  2. Install the skill library into DisCo's managed skill directory.

Installing workflow meta skills is optional. Use that path only when you want another agent to run DisCo-style creation or paper-to-skill workflows without using the full CLI.

Install DisCo

Install the DisCo CLI from npm:

npm install -g @auto-ml-skills/disco
disco

DisCo requires Node.js >=22.19.0. pi natively supports 35 model providers, and DisCo inherits that provider layer. Configure at least one provider in the startup flow with /login, or use environment variables such as OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENROUTER_API_KEY, or MISTRAL_API_KEY.

Build from source for local development
git clone https://github.com/VectorSpaceLab/Auto-ML-Skills.git
cd Auto-ML-Skills
bash scripts/build-from-source-link.sh

The script installs workspace dependencies, builds the TypeScript packages, and links the disco command globally for local use.

Install The Runtime Skill Library

Clone this repository and copy the runtime repo skills into DisCo's managed skills directory:

git clone https://github.com/VectorSpaceLab/Auto-ML-Skills.git
cd Auto-ML-Skills
mkdir -p ~/.disco/agent/skills
cp -R repo-skills/* ~/.disco/agent/skills/

Restart DisCo after copying so the managed skill index is reloaded.

Install Workflow Meta Skills (Optional)

The top-level meta-skills/ directory contains workflow skills for agents that should run DisCo-style repo-skill or paper-to-skill workflows without relying on the full DisCo CLI.

If you do not already have a local checkout, clone this repository first:

git clone https://github.com/VectorSpaceLab/Auto-ML-Skills.git
cd Auto-ML-Skills

Install all workflow meta skills into Codex:

mkdir -p ~/.codex/skills
cp -R meta-skills/* ~/.codex/skills/

Install all workflow meta skills into Claude Code:

mkdir -p ~/.claude/skills
cp -R meta-skills/* ~/.claude/skills/
Install only the paper-to-skill workflow into Codex
mkdir -p ~/.codex/skills
cp -R \
  meta-skills/create-paper-skills \
  meta-skills/paper-skills-distiller \
  meta-skills/plan-paper-skill-modules \
  meta-skills/create-paper-module-skill \
  meta-skills/prepare-paper-recovery-env \
  meta-skills/recover-paper-result \
  meta-skills/analyze-paper-recovery \
  ~/.codex/skills/

See meta-skills/README.md for the workflow list, Claude Code paper-only install command, copy-and-run agent installation prompts that clone the repository automatically, and default workflow artifact layout.

🚀 Quick Start

Use Repo Skills In Codex Or Claude Code

After the skill library is installed in DisCo's managed skills directory, use DisCo's import workflow to export selected or all repo skills into your target agent. For example, import the router plus the vllm and sglang skills into Claude Code:

disco -p "/skill:import-repo-skills-to-agent import vllm and sglang to ~/.claude"

To import the same skills into Codex:

disco -p "/skill:import-repo-skills-to-agent import vllm and sglang to ~/.codex"

Restart the agent, then ask for a concrete deployment task:

Use the repo skills to compare vLLM and SGLang for deploying Qwen3-32B on this
machine, then prepare a minimal OpenAI-compatible serving plan with launch
commands, environment checks, and a smoke-test request.
Hint: make the router easy for your agent to use

After importing repo skills, tell your agent to consult repo-skills-router when a user request could benefit from installed repository skills. A project CLAUDE.md or AGENTS.md can include a short instruction such as:

When a task involves ML libraries, LLM serving, RAG, agents, bio/chem, vision,
MLOps, RL, evaluation, or scientific Python, proactively check
repo-skills-router before choosing a library-specific approach.

You can also invoke the router directly in a request:

/repo-skills-router compare vLLM and SGLang for this deployment task
$repo-skills-router compare vLLM and SGLang for this deployment task

Use /repo-skills-router in Claude Code and $repo-skills-router in Codex.

Create A Skill For A Repository

Use DisCo to create and verify a repo-specific skill from source evidence:

disco -p "Create a repo skill for /path/to/repo."

The workflow analyzes repository structure, prepares or checks a Python inspection environment when needed, writes runtime guidance, records provenance, and then hands the draft to verify-repo-skill. Verification creates assertion-backed usability cases, runs content-level self-refine, checks safe native examples or tests when available, runs static quality gates, and writes coverage and review artifacts before the skill is treated as ready.

To let the agent choose the extraction scope and import the verified skill into DisCo's managed library without another confirmation round, delegate both decisions in the request:

disco -p "Create a repo skill for /path/to/repo with auto decide and auto import."

Create Skills From A Paper

Use the paper-to-skill workflow integrated in the DisCo CLI when the source is a research paper rather than a software repository. For repeatable runs, copy and fill the bundled run-config template, then pass it to DisCo:

cp meta-skills/create-paper-skills/assets/distiller-run-config-template.toml \
  /path/to/distiller_run_config.toml
disco --source paper -p "Use Distiller to process the runs in this config. config_path: /path/to/distiller_run_config.toml"

The paper source can be a local PDF or text file, direct PDF URL, arXiv URL or id, or paper title. An implementation repository is optional and can be a local path, Git URL, none, or unknown. Distiller modularizes the paper, creates and validates module-level skills, prepares bounded runtime evidence, runs the strongest feasible recovery experiment without reading the original implementation repo, analyzes gaps, refines within iteration_budget when needed, and writes attempt artifacts plus final reports under <attempt_dir>/reports/final/. The default recovery_mode is hard, so reduced, proxy, toy, or fallback runs are recorded as diagnostics rather than accepted as successful recovery unless you explicitly choose soft mode.

Extend An Existing Skill

Ask DisCo to extend an existing skill when it is correct but needs deeper coverage for a new workflow area:

disco -p "Add streaming inference coverage to the existing skill at /path/to/repo/skills/example-skill using /path/to/repo as evidence."

Refresh A Skill After Upstream Changes

Ask DisCo to refresh a skill when the upstream repository changes APIs, configs, examples, dependencies, or runtime behavior:

disco -p "Refresh the skill at /path/to/repo/skills/example-skill against the current /path/to/repo code."

Refresh should preserve correct existing guidance while updating stale instructions against the current source baseline.

🛠️ DisCo Workflow Skills

DisCo bundles workflow skills that orchestrate skill creation, verification, maintenance, import, and paper distillation. They are available inside the CLI and mirrored under meta-skills/ for optional installation into other agents.

  • Package and repository workflows
    • create-repo-skill: create a repo-specific skill from source code, docs, examples, tests, package metadata, and optional installed-package inspection.
    • prepare-repo-skill-env: prepare or verify an isolated Python inspection environment before deeper repository analysis.
    • verify-repo-skill: verify generated or refreshed repo skills with usability cases, content self-refine, safe native checks, static gates, reports, and import-readiness checks.
    • refresh-repo-skill: update an existing skill when upstream APIs, configs, examples, dependencies, or runtime behavior change.
    • extend-repo-skill: add deeper coverage to an existing skill for a new workflow area.
    • repo-skills-router: route user requests across installed repo skills by scenario and package coverage.
    • import-repo-skills-to-agent: copy selected or all managed repo skills, plus the router, into Codex, Claude Code, or another agent skill directory.
  • Paper-to-skill workflows
    • create-paper-skills: entry point for disco --source paper requests.
    • paper-skills-distiller: orchestrate source resolution, paper modularization, module-skill creation, recovery, analysis, refinement, and final reporting.
    • plan-paper-skill-modules: read the paper and produce the paper profile, module plan, and module docs.
    • create-paper-module-skill: convert each module doc into a reusable generated Agent Skill with validation checks.
    • prepare-paper-recovery-env: prepare bounded runtime evidence, package setup, model/data state, and recovery handoff artifacts.
    • recover-paper-result: run a bounded recovery experiment using generated skills without reading the original implementation repo.
    • analyze-paper-recovery: compare recovery evidence against the paper target and produce accept, refine, or blocker feedback.

🤝 Contributing

We welcome contributions in three main areas:

  1. Contribute generated repo skills. Add a publishable runtime skill under repo-skills/<skill-id>/, include provenance and routing metadata, and update repo-skills-router so agents can discover it.
  2. Extend or refresh existing repo skills. Improve stale, incomplete, or unclear skills with source-grounded changes. Update provenance or routing metadata when the upstream baseline or coverage changes.
  3. Improve the DisCo CLI source. Changes to the TypeScript CLI under src/ are welcome, including package/repo and paper-to-skill workflows. Run focused checks and document behavior changes. Repo-skill workflow changes should preserve the create/verify split, review/test artifact layout, import-readiness gates, and locked router-update transaction. Updates to the integrated Paper2Skills workflow should preserve its source-resolution, modularization, generated-skill validation, recovery, analysis, and final-report contracts.

For repo-skill PRs, list the model, provider, reasoning or thinking level, source repository commit, and verification steps used to produce or revise the skill. For DisCo CLI changes that touch paper-to-skill behavior, include the paper source, run config, recovery mode, validation artifacts, and final report path when applicable. See CONTRIBUTING.md for the full checklist.

📚 Documentation

Page Description
Imported Repo Skills Catalog Public catalog of included runtime repo skills, grouped by workflow area with upstream baselines.
Architecture Repository layers, DisCo source layout, skill authoring pipeline, runtime skill shape, and managed library model.
Workflow Meta Skills Copyable package/repo and paper-to-skill workflow skills for external agents.
DisCo CLI README DisCo CLI usage for repo-skill creation, import, verification, and paper-to-skill workflows.
Contributing Contribution rules for generated repo skills, router/catalog updates, documentation, meta skills, and CLI source.

🙏 Acknowledgement

DisCo's CLI and agent runtime are built on the foundation of earendil-works/pi, an open-source AI agent toolkit with a unified LLM API, agent loop, terminal UI, and coding-agent CLI.

Auto-ML-Skills is also made possible by the GitHub open-source community. The repo skills in this library exist because many researchers and engineers have released high-quality ML, agent, data, bio/chem, vision, and infrastructure projects for the community to build on.

📄 License

Auto-ML-Skills is released under the Apache License 2.0. Unless a file explicitly states otherwise, the license applies to both the DisCo CLI source code in src/ and the open-sourced runtime repo skills under repo-skills/.

See LICENSE for the full license text.

📝 Citation

TBA

About

A Skill Library for Automated Machine Learning

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors