merge: sync upstream/main + PR #161 (plugin + compound tools)#30
merge: sync upstream/main + PR #161 (plugin + compound tools)#30
Conversation
…s-harvard#153) Skills (114 total): - Rewrite 80+ skills as reasoning guides (not reference tables) - Add LOOK UP DON'T GUESS and COMPUTE DON'T DESCRIBE across all skills - Add new skills: data-wrangling (24 domain API patterns), dataset-discovery, epidemiological-analysis, data-integration-analysis, ecology-biodiversity, inorganic-physical-chemistry, plant-genomics, vaccine-design, stem-cell, lipidomics, non-coding-RNA, aging-senescence - Add Programmatic Access sections to 6 domain skills (TCGA, GWAS, spatial-transcriptomics, variant-to-mechanism, binder-discovery, clinical-trials) - Generalize all analysis skills to be data-source-agnostic - Add progressive disclosure: references/ for specialized domains - Improve skill descriptions for better triggering Tools (31 new): - RGD (4 tools), T3DB toxins, IEDB MHC binding prediction - 11 scientific calculator tools (DNA translate, molecular formula, equilibrium solver, enzyme kinetics, statistics, etc.) - AgingCohort_search (28+ longitudinal cohort registry) - NHANES_download_and_parse (XPT download + parse + age filter) - DataQuality_assess (missingness, outliers, correlations) - MetaAnalysis_run (fixed/random effects, I-squared, Q-test) - 4 dataset discovery tools (re3data, Data.gov, OpenAIRE, DataCite) Bug fixes: - Fix 50+ tool name references across skills - Fix NHANES search (dynamic CDC catalog query, not hardcoded keywords) - Fix tool return envelopes (Unpaywall, MyGene, HPA, EuropePMC) - Fix STRING, OpenTargets, ENCODE, Foldseek, STITCH, BridgeDb - Fix BindingDB test for broken API detection Router: - Add MC elimination strategy, batch processing protocol - Add 20+ bundled computation scripts - Route to all 114 skills Version bumped to 1.1.11
New plugin/ directory with official Claude Code plugin format: - .claude-plugin/plugin.json: manifest (name, version, description) - .mcp.json: auto-configures ToolUniverse MCP server with --refresh - settings.json: auto-approve read-only discovery tools - commands/find-tools.md: /tooluniverse:find-tools slash command - commands/run-tool.md: /tooluniverse:run-tool slash command - agents/researcher.md: autonomous research agent with 1000+ tools - README.md: install and usage documentation Build script: scripts/build-plugin.sh - Assembles distributable plugin from repo (manifest + skills + agents) - Copies all 113 tooluniverse-* skills into plugin/skills/ - Output: dist/tooluniverse-plugin/ (7.6MB, 520 files) Install: claude --plugin-dir dist/tooluniverse-plugin
gene-regulatory-networks and population-genetics had markdown headings instead of YAML frontmatter, preventing Claude Code skill discovery.
Addressed 4 weaknesses found in A/B testing: 1. Reduce discovery overhead: Added example parameters to all tools in quick reference — agent can call directly without get_tool_info 2. Enforce batching: Added explicit Python batch pattern with code example in both research command and researcher agent 3. Prevent trial-and-error: Added exact parameter formats (e.g., OncoKB needs "operation" field, OpenTargets needs ensemblId not gene symbol) 4. Added /tooluniverse:research command — comprehensive slash command with full tool reference table and efficiency rules Test results: find_tools calls reduced 75% (4→1), subagent spawns eliminated, cross-validation now happening across 4 databases.
MCP is good for tool discovery (find_tools, get_tool_info) but inefficient for batch data retrieval (37 sequential execute_tool calls). Changed strategy: use CLI (tu run) via Python scripts for all actual data retrieval. One Python script with 10 tu_run() calls replaces 10 sequential MCP calls. MCP reserved for discovery only. Updated: researcher agent, research command, find-tools command, README. Added tu_run() helper function pattern and Python SDK example.
…ketplace - plugin/skills/ now contains per-skill symlinks to ../../skills/tooluniverse-* + setup-tooluniverse so the plugin directory is self-contained without moving the source skills/ folder. - plugin/sync-skills.sh regenerates the symlink set when skills are added. - plugin/.claude-plugin/marketplace.json declares the plugin dir as a single-plugin marketplace, enabling 'claude plugin install tooluniverse@tooluniverse-local' workflow. - .gitignore excludes benchmark outputs (skills/evals/*/results_*.json), memory notes, and API-key patterns from the repo. - .gitattributes adds export-ignore for non-plugin directories so 'git archive' produces a clean release tarball.
… content commands/research.md is now scoped to TU usage (tool recipes, compound tools, skill dispatch table). Domain analysis guidance moved into the matching specialized skills so content has a single owner. Skill additions (each skill gains a 'BixBench-verified conventions' section): - tooluniverse-statistical-modeling: clinical-trial AE inner-join pattern, OR reduction semantics, F-stat vs p-value distinction, spline pure-strain anchor, frequency-ratio output format, CSV latin1 fallback. - tooluniverse-rnaseq-deseq2: authoritative-script pattern (copy ALL kwargs literally incl. refit_cooks=True), R vs pydeseq2 selection rule, strain identity parsing, 'uniquely DE' exclusive semantics, denominator check for set-operation percentages. - tooluniverse-gene-enrichment: R clusterProfiler vs gseapy selection, simplify(0.7) term-collapse caveat, explicit universe= background rule. - tooluniverse-crispr-screen-analysis: sgRNA-level Spearman convention, Reactome GSEA ranking column, literal pathway-name matching. - tooluniverse-phylogenetics: parsimony informative site gap-only exclusion, treeness ratio definition. - tooluniverse-variant-analysis: multi-row Excel header parsing, SO-term coding vs non-coding denominator split. tooluniverse-drug-target-validation improvements for the ML demo: - Top-level 'RUN THE ML MODELS, DON'T SKIP THEM' rule alongside 'LOOK UP DON'T GUESS'. - New Phase 3b requiring all 10 ADMET-AI Chemprop-GNN endpoints and a side-by-side head-to-head table when multiple candidate compounds exist. - Phase 8 now mandates ESMFold + DoGSite3 (ProteinsPlus) even when PDB structures exist, so the deep-learning inference is always in the trace. - Phase 10 adds a 'Deep-Learning Models Contributing' attribution table naming each ML predictor's architecture and contribution.
ADMET-AI tools segfaulted (exit 139) via tu CLI / MCP server on macOS
Apple Silicon. Root cause: torch MPS backend crashes in forked subprocess.
Fix: torch.set_default_device('cpu') at package init + env vars.
research.md: add skill dispatch table at top so /tooluniverse:research routes cancer-mutation queries to precision-oncology, target-validation queries to drug-target-validation, etc. precision-oncology: promote FAERS to MANDATORY (was optional bullet). Agent now calls FAERS_search_adverse_event_reports for top 1-2 drugs before finalizing. drug-target-validation: add ADMET-AI SDK fallback pattern — if MCP calls fail, agent retries via Python SDK in Bash. .mcp.json: add PYTORCH env vars for MPS fallback.
Make Claude Code plugin installation a two-command flow: claude plugin marketplace add mims-harvard/ToolUniverse claude plugin install tooluniverse@tooluniverse Changes: - .claude-plugin/marketplace.json at repo root with source: ./plugin (enables GitHub owner/repo marketplace add without sparse checkout) - skills/tooluniverse-install-plugin/SKILL.md: user-facing install guide (prereqs, two-command install, version pinning, verify, API keys, update/uninstall, offline zip path, troubleshooting table) - .github/workflows/release-plugin.yml: on tag push, build tooluniverse-plugin-vX.Y.Z.zip with resolved skills symlinks and a rewritten marketplace.json, attach to the GitHub release - plugin/README.md: replace local path install with marketplace flow, link to the install skill - skills/setup-tooluniverse/SKILL.md: callout for Claude Code users pointing at the plugin install path over manual MCP config
The install skill is Claude-Code-plugin-specific, so name it that way — `tooluniverse-install-plugin` was ambiguous (install what? which plugin?). Renamed directory + frontmatter name + all inbound refs in plugin/README.md, setup-tooluniverse skill, and the release workflow.
Implements the plan for improving plugin output quality on multi-
database questions:
Compound tools (3 new, each aggregates multiple atomic databases):
- gather_gene_disease_associations — DisGeNET + OMIM + OpenTargets
+ GenCC + ClinVar with cross-source concordance scoring
- annotate_variant_multi_source — ClinVar + gnomAD + CIViC + UniProt
- gather_disease_profile — Orphanet + OMIM + DisGeNET + OpenTargets
+ OLS, returns unified identifiers (orphanet/omim/efo/mondo) +
gene associations
These return structured {status, data} with a sources_failed list,
so partial failures are tolerated without the whole call erroring.
MSigDB tool + config:
- check_gene_in_set / get_gene_set_members operations covering GTRD
TF targets, miRDB miRNA targets, oncogenic sigs (C6), hallmarks (H)
Benchmark harness skill (skills/devtu-benchmark-harness):
- run_eval.py — unified runner for lab-bench + BixBench, with
--mode, --category, --n, --timeout; resumes from existing results
- grade_answers.py — exact / MC / range / normalized / numeric /
LLM-verifier strategies, batch grading
- analyze_results.py — category accuracy, per-q plugin-vs-baseline
delta, failure classification (timeout / error / wrong / grading)
- generate_report.py — markdown report with exec summary + top
failures
- Phase 3.5 in devtu-self-evolve invokes the harness after testing
Plumbing:
- _lazy_registry_static.py: 4 new tool class entries
- default_config.py: 3 new JSON paths for compound tools
- skills/evals: question banks for bixbench (61 Q) and lab-bench
(20 Q) checked in; result snapshots gitignored
- tests/test_claude_code_plugin.py: 700 lines validating plugin
manifest / MCP / settings / commands / agent / tool refs
- tests/test_aging_cohort_tool.py: 385 lines for AgingCohort tool
…1.11) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ompound tools) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Merges upstream changes and adds/updates ToolUniverse plugin packaging, compound tools, and benchmark harness assets while preserving fork customizations.
Changes:
- Adds Claude Code plugin packaging (plugin manifests/config, build/release workflow) and updates skills/docs for plugin installation and benchmark conventions.
- Introduces compound tools for multi-source gene/disease/variant queries plus new MSigDB and NHANES functionality, and registers new tools in configs/registries.
- Updates numerous tool JSON schemas/examples and adds benchmark harness scripts + eval artifacts.
Reviewed changes
Copilot reviewed 191 out of 193 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/tools/test_semantic_scholar_tool_resilience.py | Removes Semantic Scholar rate-limit regression tests; keeps basic error-shaping/abstract enrichment tests. |
| tests/test_aging_cohort_tool.py | Adds unit tests for AgingCohort search behavior and cohort registry completeness. |
| src/tooluniverse/tools/init.py | Registers additional tool wrappers (BioGRID tools, NHANES_download_and_parse). |
| src/tooluniverse/tools/NHANES_download_and_parse.py | Refactors wrapper arg-building and signature for NHANES download/parse. |
| src/tooluniverse/tools/MetaAnalysis_run.py | Tightens typing/docs and wrapper signature for meta-analysis tool. |
| src/tooluniverse/tools/DegreesOfUnsaturation_calculate.py | Adjusts parameter ordering/docs and request arg mapping for DoU tool wrapper. |
| src/tooluniverse/tools/DataQuality_assess.py | Tightens typing/docs and wrapper signature for data quality tool. |
| src/tooluniverse/tools/AgingCohort_search.py | Improves typing and docstring for aging cohort search wrapper. |
| src/tooluniverse/restful_tool.py | Removes response-size trimming logic for closure arrays. |
| src/tooluniverse/msigdb_tool.py | Adds a new MSigDB tool implementation (gene set fetch/search/membership). |
| src/tooluniverse/default_config.py | Adds compound tool JSON files to default tool file set. |
| src/tooluniverse/data/zfin_tools.json | Adds/updates ZFIN search tool schema/examples (now duplicated). |
| src/tooluniverse/data/wikipathways_tools.json | Adjusts test examples for WikiPathways tool. |
| src/tooluniverse/data/unpaywall_tools.json | Updates Unpaywall test examples. |
| src/tooluniverse/data/rcsb_advanced_search_tools.json | Refactors return schema to oneOf success/error forms. |
| src/tooluniverse/data/nhanes_tools.json | Adds NHANES_download_and_parse tool definition. |
| src/tooluniverse/data/msigdb_tools.json | Adds MSigDB tool entries for membership checks + gene set retrieval. |
| src/tooluniverse/data/mgi_tools.json | Refactors MGI schemas to oneOf success/error + metadata wrapping. |
| src/tooluniverse/data/iedb_tools.json | Removes antigen_uniprot parameter/mapping from IEDB tool schema. |
| src/tooluniverse/data/europe_pmc_tools.json | Adds structured full-text retrieval tool schema for Europe PMC. |
| src/tooluniverse/data/ena_portal_tools.json | Refactors ENA portal schemas to oneOf success/error + metadata wrapping. |
| src/tooluniverse/data/datacite_tools.json | Adjusts DataCite schema fields and changes search return schema shape. |
| src/tooluniverse/data/compound_variant_tools.json | Adds compound variant annotation tool schema. |
| src/tooluniverse/data/compound_gene_disease_tools.json | Adds compound gene–disease association tool schema. |
| src/tooluniverse/data/compound_disease_tools.json | Adds compound disease profile tool schema. |
| src/tooluniverse/data/brenda_tools.json | Adds/updates BRENDA enzyme kinetics tool schema (now duplicated). |
| src/tooluniverse/data/admetai_tools.json | Reduces/edits ADMET-AI test examples. |
| src/tooluniverse/compound_variant_tool.py | Implements compound variant annotation tool. |
| src/tooluniverse/compound_gene_disease_tool.py | Implements compound gene–disease association tool. |
| src/tooluniverse/compound_disease_tool.py | Implements compound disease profile tool. |
| src/tooluniverse/admetai_tool.py | Adds torch/MPS safeguards and type-ignore annotations around DataFrame usage. |
| src/tooluniverse/_lazy_registry_static.py | Registers new tool classes for lazy loading (compound tools, MSigDB, DataQualityTool). |
| src/tooluniverse/init.py | Adds global torch/MPS environment + default-device forcing at import time. |
| skills/tooluniverse-variant-analysis/SKILL.md | Adds BixBench-verified conventions section. |
| skills/tooluniverse-statistical-modeling/SKILL.md | Adds BixBench-verified conventions section. |
| skills/tooluniverse-rnaseq-deseq2/SKILL.md | Adds BixBench-verified conventions section. |
| skills/tooluniverse-precision-oncology/SKILL.md | Strengthens mandatory safety/pharmacogenomics steps. |
| skills/tooluniverse-population-genetics/SKILL.md | Adds YAML frontmatter metadata for plugin consumption. |
| skills/tooluniverse-phylogenetics/SKILL.md | Adds BixBench-verified conventions section. |
| skills/tooluniverse-gene-regulatory-networks/SKILL.md | Adds YAML frontmatter metadata for plugin consumption. |
| skills/tooluniverse-gene-enrichment/SKILL.md | Adds BixBench-verified conventions section. |
| skills/tooluniverse-drug-target-validation/SKILL.md | Adds mandatory ML-model execution guidance and reporting requirements. |
| skills/tooluniverse-crispr-screen-analysis/SKILL.md | Adds BixBench-verified conventions section. |
| skills/tooluniverse-claude-code-plugin/SKILL.md | Adds new skill documenting plugin installation/maintenance. |
| skills/setup-tooluniverse/SKILL.md | Updates setup guidance to prefer plugin install for Claude Code users. |
| skills/evals/run_benchmark.py | Adds benchmark runner for Claude Code plugin evaluation. |
| skills/evals/research_eval_results.json | Adds evaluation results artifact. |
| skills/evals/lab-bench/questions.json | Adds lab-bench question set. |
| skills/devtu-self-evolve/SKILL.md | Adds benchmark evaluation phase guidance. |
| skills/devtu-benchmark-harness/scripts/run_eval.py | Adds unified benchmark runner script. |
| skills/devtu-benchmark-harness/scripts/grade_answers.py | Adds multi-strategy answer grading script. |
| skills/devtu-benchmark-harness/scripts/generate_report.py | Adds markdown report generator. |
| skills/devtu-benchmark-harness/scripts/analyze_results.py | Adds results analyzer and failure categorization. |
| skills/devtu-benchmark-harness/references/benchmark-guide.md | Adds benchmark guide/reference document. |
| skills/devtu-benchmark-harness/evals/evals.json | Adds eval definitions for harness. |
| skills/devtu-benchmark-harness/SKILL.md | Adds new skill documenting benchmark harness workflow. |
| scripts/build-plugin.sh | Adds local build script for assembling plugin dist directory. |
| plugin/sync-skills.sh | Adds script to symlink user-facing skills into plugin dir. |
| plugin/settings.json | Adds default MCP permission auto-approve settings. |
| plugin/commands/run-tool.md | Adds slash command for executing a named tool. |
| plugin/commands/research.md | Adds primary research slash command guidance and routing rules. |
| plugin/commands/find-tools.md | Adds tool discovery slash command. |
| plugin/agents/researcher.md | Adds autonomous researcher agent definition. |
| plugin/README.md | Adds plugin README and usage notes. |
| plugin/.mcp.json | Adds MCP server config for uvx tooluniverse with env vars. |
| plugin/.claude-plugin/plugin.json | Adds plugin manifest. |
| plugin/.claude-plugin/marketplace.json | Adds local marketplace manifest for plugin. |
| .gitignore | Ignores benchmark outputs and local memory/secrets. |
| .github/workflows/release-plugin.yml | Adds GitHub Action to build and attach plugin zip on tags. |
| .gitattributes | Adds export-ignore rules for archives/releases. |
| .claude-plugin/marketplace.json | Adds root marketplace manifest pointing to ./plugin source. |
Comments suppressed due to low confidence (2)
src/tooluniverse/data/brenda_tools.json:304
- This file now defines
BRENDA_get_enzyme_kineticstwice (second copy begins around line 296). Duplicate tool names can cause registry collisions and unpredictable loading. Remove the duplicate entry or consolidate the definitions into one.
src/tooluniverse/data/zfin_tools.json:666 - This file now contains two separate tool objects with the same name
ZFIN_search(around lines 603 and 661). Duplicate tool names can lead to ambiguous/last-one-wins behavior when loading the registry. Remove one of the duplicate entries (or rename if they are intended to differ).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _args["variables"] = variables | ||
| if age_min is not None: | ||
| _args["age_min"] = age_min | ||
| if age_max is not None: | ||
| _args["age_max"] = age_max | ||
| if max_rows != 5000: | ||
| _args["max_rows"] = max_rows |
There was a problem hiding this comment.
max_rows is annotated as int, but callers can still pass JSON null which becomes None. In that case, max_rows != 5000 is true and _args['max_rows'] will be set to None, likely failing schema validation downstream. Consider either restoring Optional[int] and guarding with if max_rows is not None and max_rows != 5000, or coercing/validating max_rows before building _args.
| # Force CPU before torch is imported anywhere — prevents MPS (Metal) segfaults | ||
| # in forked subprocesses (uvx MCP server, tu CLI, Claude Code plugin). | ||
| os.environ.setdefault("PYTORCH_MPS_HIGH_WATERMARK_RATIO", "0.0") | ||
| os.environ.setdefault("PYTORCH_ENABLE_MPS_FALLBACK", "1") | ||
| try: | ||
| import torch | ||
|
|
||
| if hasattr(torch, "set_default_device"): | ||
| torch.set_default_device("cpu") | ||
| except ImportError: |
There was a problem hiding this comment.
Importing torch at package import time adds a heavy dependency load and global side effects (setting default device) even for users who never invoke ADMET-AI. This can slow ToolUniverse startup and can prevent GPU/MPS usage in other contexts. Consider limiting the package-level change to setting the env vars, and defer any torch import / set_default_device('cpu') to the ADMET-related code paths only (e.g., inside admetai_tool.py).
| from .execute_function import ToolUniverse | ||
|
|
||
| tu = ToolUniverse() | ||
| tu.load_tools() | ||
|
|
There was a problem hiding this comment.
Creating a new ToolUniverse() and calling load_tools() inside every run() will reload the full tool registry per invocation (potentially thousands of tools), which is expensive and can become a major latency bottleneck for this compound tool. Consider reusing a shared/singleton ToolUniverse instance (module-level cache) or calling the underlying tool registry/client directly to avoid repeated full loads.
| response = remove_none_and_empty_values(response) | ||
|
|
||
| # Strip ontology closure arrays — they contain ~150 ancestor terms per | ||
| # item and inflate responses from ~5KB to 100KB+ without adding signal | ||
| # for downstream LLM consumption. | ||
| _CLOSURE_KEYS = {"object_closure", "object_closure_label"} | ||
|
|
||
| def _strip_closure(obj): | ||
| if isinstance(obj, dict): | ||
| return { | ||
| k: _strip_closure(v) | ||
| for k, v in obj.items() | ||
| if k not in _CLOSURE_KEYS | ||
| } | ||
| if isinstance(obj, list): | ||
| return [_strip_closure(v) for v in obj] | ||
| return obj | ||
|
|
||
| response = _strip_closure(response) | ||
| if isinstance(response, dict) and "status" not in response: | ||
| return {"status": "success", "data": response} | ||
| return response |
There was a problem hiding this comment.
The response post-processing no longer strips object_closure / object_closure_label arrays. Those fields can be very large and were previously removed to keep responses small for downstream consumption. If this was removed unintentionally during the upstream merge, consider restoring the stripping (optionally behind a flag) to avoid large payloads and higher latency/memory usage.
| for skill_dir in "$REPO_ROOT/skills"/*/; do | ||
| dir_name=$(basename "$skill_dir") | ||
| if [ -f "$skill_dir/SKILL.md" ] && [[ "$dir_name" == tooluniverse* ]]; then | ||
| cp -r "$skill_dir" "$DIST_DIR/skills/$dir_name" | ||
| skill_count=$((skill_count + 1)) |
There was a problem hiding this comment.
The local plugin build script only copies skills whose directory name starts with tooluniverse*, but plugin/sync-skills.sh includes setup-tooluniverse. This means a locally-built plugin will be missing the setup skill (and any others outside the prefix). Update the filter to include setup-tooluniverse (and any other intended user-facing skills) so local builds match the release artifact behavior.
| from .execute_function import ToolUniverse | ||
|
|
||
| tu = ToolUniverse() | ||
| tu.load_tools() | ||
|
|
There was a problem hiding this comment.
Creating a new ToolUniverse() and calling load_tools() on every run() reloads the full tool registry per call and will be a significant latency hit for this compound tool. Consider caching a ToolUniverse instance or using a lighter-weight execution path to call the underlying tools without re-loading everything each time.
| from .execute_function import ToolUniverse | ||
|
|
||
| tu = ToolUniverse() | ||
| tu.load_tools() | ||
|
|
There was a problem hiding this comment.
Creating a new ToolUniverse() and calling load_tools() on every run() reloads the full tool registry per call and will be a significant latency hit for this compound tool. Consider caching a ToolUniverse instance or using a lighter-weight execution path to call the underlying tools without re-loading everything each time.
| _args = { | ||
| k: v | ||
| for k, v in { | ||
| "operation": operation, | ||
| "formula": formula, | ||
| "C": C, | ||
| "H": H, | ||
| "N": N, | ||
| "O": oxygen, | ||
| "S": S, | ||
| "F": F, | ||
| "Cl": Cl, | ||
| "Br": Br, | ||
| "oxygen": oxygen, | ||
| "iodine": iodine, | ||
| "I": iodine, | ||
| }.items() |
There was a problem hiding this comment.
The request payload keys no longer match the tool schema: degrees_of_unsaturation_tools.json defines parameters oxygen and iodine, but this wrapper sends them as O and I. This will cause the shared client to reject/ignore oxygen/iodine arguments. Align the argument dict keys with the JSON tool schema (or update the schema to match).
Summary
Conflict Resolution
Verification
Test plan
🤖 Generated with Claude Code