LLM-based multi-agent framework for interactive analysis of mass spectrometry metabolomics knowledge graphs
Natural-language questions to schema-aware SPARQL Β Β·Β Authoritative entity resolution Β Β·Β Interactive metabolomics data mining
- Hardened the interpreter path by isolating LLM-generated Python behind trusted-mode controls and a subprocess runner instead of in-process execution.
- Improved CLI safety with better file staging validation, collision handling, session-scoped logging, and consistent
--api-keypropagation through fallback SPARQL paths. - Made the Streamlit app easier to launch locally by stabilizing its import path handling and aligning the recommended startup command with the tested repo-root workflow.
=======
Try the public MetaboT demonstrator at metabot.holobiomicslab.eu. It is connected to the Experimental Natural Products Knowledge Graph (ENPKG), an open metabolomics knowledge graph built from a chemodiverse collection of 1,600 plant extracts.
Full documentation is available at holobiomicslab.github.io/MetaboT.
- What is MetaboT?
- Validation Snapshot
- Architecture Overview
- Installation
- Running MetaboT
- Using Your Own Knowledge Graph
- Documentation
- Citation
- License
MetaboT is an open-source multi-agent framework that translates natural-language metabolomics questions into executable SPARQL queries over knowledge graphs. It was designed to lower the barrier to semantic data mining for researchers working with mass spectrometry data, especially when the underlying graph is rich but difficult to query directly with RDF and SPARQL.
Compared with single-model prompting, MetaboT uses a workflow of specialized agents for question validation, entity resolution, schema-aware query generation, iterative refinement, and result interpretation. In practice, this helps reduce hallucinated identifiers and malformed queries while keeping the interaction conversational.
The latest manuscript reports the following benchmark results on a manually curated ENPKG evaluation set:
| System | Overall accuracy | High-complexity accuracy |
|---|---|---|
| GPT-4o single-shot | 8.16% | 0.00% |
| MetaboT with GPT-4o mini | 12.24% | 15.79% |
| MetaboT with GPT-4o | 83.67% | 78.95% |
These numbers are reported over 49 scored questions from a 50-question benchmark, after excluding one refinement artifact described in the manuscript. The benchmark dataset is included in app/data/evaluation_dataset.csv and archived on Zenodo.
MetaboT's workflow follows six main roles:
- Entry Agent classifies whether a request is a new knowledge question or a follow-up.
- Validator Agent checks whether the question is in scope for the knowledge graph schema.
- Supervisor Agent decides which downstream agents should be used.
- KG Agent resolves entities such as taxa, chemical classes, SMILES strings, and biological targets using external resources. In the current codebase this role is implemented by
ENPKG_agent. - SPARQL Query Runner Agent builds and executes schema-aware SPARQL through
GraphSparqlQAChain. - Interpreter Agent summarizes results and can generate visual outputs when requested.
Entity resolution is grounded in tools and resources such as Wikidata, ChEMBL, NPClassifier, and GNPS rather than relying only on an LLM's internal memory.
- Python 3.11
- Conda or Miniconda recommended
- An API key for at least one supported LLM provider
- Optional: LangSmith credentials for tracing
The default local setup targets the public ENPKG endpoint, so you can start without deploying your own knowledge graph.
git clone https://github.com/HolobiomicsLab/MetaboT.git
cd MetaboTconda env create -f environment.yml
conda activate metabotTo launch the application through Streamlit, install the dependencies and run the app from the repository root. In your terminal, execute:
pip install -r requirements.txt
python -m streamlit run streamlit_webapp/streamlit_app.pyThis repo-root launch path is the recommended setup for local development and matches the Streamlit smoke-tested workflow used in v1.1.0. You can provide your OpenAI key in the sidebar once the app starts, or preconfigure contributor/admin keys through environment variables if you use those deployment paths.
If you prefer a plain virtual environment instead of Conda:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCreate a .env file in the project root:
OPENAI_API_KEY=your_api_key_here
# Optional: override the default ENPKG endpoint
KG_ENDPOINT_URL=https://enpkg.commons-lab.org/graphdb/repositories/ENPKG
# Optional: endpoint authentication
SPARQL_USERNAME=
SPARQL_PASSWORD=
# Optional: tracing
LANGCHAIN_API_KEY=
LANGCHAIN_PROJECT=MetaboT
LANGCHAIN_ENDPOINT=https://api.smith.langchain.comMetaboT also includes provider mappings for DEEPSEEK_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, MISTRAL_API_KEY, OVHCLOUD_API_KEY, and HUGGINGFACE_API_KEY. See the configuration guide for details.
Run a predefined standard question:
python -m app.core.main -q 1The numbers for -q index into standard_questions in app/core/questions.py (module app.core.questions).
Run a custom question:
python -m app.core.main -c "What are the SIRIUS structural annotations for Tabernaemontana coffeoides?"Override the endpoint at runtime:
python -m app.core.main -c "Which lab extracts show inhibition above 50% against Leishmania donovani?" --endpoint https://your-endpoint.example/sparqlAttach a local input file to your question with -f (the file is copied into the session so the FILE_ANALYZER tool can use it):
python -m app.core.main -c "Summarize the annotations in this file" -f path/to/your_file.csvMetaboT saves all result sets to CSV files in a temporary folder and returns the file path. When results are small, they are also displayed inline; for large result sets, only the file path is returned to avoid exceeding the LLM context window.
This repository includes a LangSmith-based automated evaluation script at app/core/tests/evaluation.py. To run it locally you need a LangSmith API key (LANGCHAIN_API_KEY or LANGSMITH_API_KEY) and an LLM provider key (e.g. OPENAI_API_KEY). See docs/examples/langsmith-evaluation.md.
The repository also includes a Streamlit interface:
pip install -r requirements.txt
export PYTHONPATH="$(pwd):${PYTHONPATH}"
streamlit run streamlit_webapp/streamlit_app.pyAfter the app starts, enter your OpenAI API key in the Streamlit sidebar under Set a OpenAI API Key.
docker-compose build
docker-compose run --rm metabot python -m app.core.main -q 1To apply MetaboT beyond the public ENPKG deployment:
- Convert your processed and annotated mass spectrometry results into a compatible knowledge graph. The ENPKG project is the recommended starting point.
- Deploy a SPARQL endpoint for that graph.
- Set
KG_ENDPOINT_URLin.envor pass--endpointat runtime. - If your schema differs substantially from ENPKG, update the schema-aware prompts in app/core/agents/validator/prompt.py and app/core/agents/sparql/tool_sparql.py.
Because the system is schema-aware, portability is good, but prompt and resolver tuning may still be needed for a new graph.
- Documentation site
- Installation guide
- Quick start
- System overview
- Configuration guide
- Examples
- Contributing guide
If you use MetaboT in research, please cite:
Bekbergenova M, Pradi L, Navet B, Tysinger E, Michel F, Feraud M, Taghzouti Y, Legrand M, Jiang T, Chen YZ, Hassoun S, Kirchhoffer O, Wolfender JL, Mehl F, Pagni M, Bittremieux W, Gandon F, Nothias LF. MetaboT: An LLM-based Multi-Agent Framework for Interactive Analysis of Mass Spectrometry Metabolomics Knowledge Graphs. Research Square preprint. DOI: 10.21203/rs.3.rs-6591884/v1
The archived evaluated version and benchmark release are available at 10.5281/zenodo.19715403.
MetaboT is released under the Apache 2.0 License. See LICENSE.txt.
MetaboT is a founding proof-of-concept within the MetaboLinkAI program for open AI-assisted metabolomics.
