Skip to content

Add database search to LLM functionality#2

Merged
JR-1991 merged 14 commits intomasterfrom
toolchain
Sep 10, 2025
Merged

Add database search to LLM functionality#2
JR-1991 merged 14 commits intomasterfrom
toolchain

Conversation

@JR-1991
Copy link
Member

@JR-1991 JR-1991 commented Sep 10, 2025

This pull request introduces several improvements and new features to the codebase, focusing on enhanced PDF upload functionality, new database search tools, and dependency updates. The most significant changes are the addition of page selection for PDF uploads, a new tool for searching biological and chemical databases, and the inclusion of supporting dependencies.

PDF Upload Enhancements

  • Added support for uploading specific pages from a PDF file via the new pages parameter in the PDFUpload class, including validation, processing, and cleanup methods. This allows users to select and upload only the relevant pages, improving efficiency and control. [1] [2] [3]
  • Integrated the pdf-lib library to enable PDF manipulation for page extraction. [1] [2]

Database Search Tools

  • Introduced a new module src/tools.ts that provides the SearchDatabaseTool and its OpenAI function specification, enabling LLM-powered searches of ChEBI, PDB, PubChem, and UniProt databases.

Dependency Updates

  • Added new dependencies to package.json for queueing (p-queue), retry logic (p-retry), timeouts (p-timeout), and PDF processing (pdf-lib).

Type Exports

  • Exported the new ToolChainEvent type from src/llm.ts in src/index.ts to support toolchain event handling.

Introduces src/tools.ts with functions and OpenAI tool specs for searching biological and chemical databases (ChEBI, PDB, PubChem, UniProt). Enables AI assistants to query molecular and protein information via LLM function calling.
The PDFUpload class now supports uploading specific pages from a PDF file using the pdf-lib library. New methods allow page selection, cleanup of temporary files, and improved error handling for invalid page numbers.
Added pdf-lib, p-queue, p-retry, and p-timeout dependencies for PDF manipulation and async control. Removed dotenv from dependencies.
Added ToolChainEvent to the list of exported types from the llm module to make it available for use in other parts of the application.
Introduces tool chain execution with automatic retry, timeout, and concurrency handling for OpenAI streaming responses. Adds types and event callbacks for progress tracking, updates input processing to support tool outputs, and refactors extractData to support async tool execution before streaming. Includes comprehensive documentation and examples for new features.
@JR-1991 JR-1991 requested a review from Copilot September 10, 2025 06:43
@JR-1991 JR-1991 self-assigned this Sep 10, 2025
@JR-1991 JR-1991 added this to EnzymeML Sep 10, 2025
@JR-1991 JR-1991 added the enhancement New feature or request label Sep 10, 2025

This comment was marked as outdated.

Updated documentation and type definitions to include PubChem as a supported database in the EnzymeML search tools. The changes clarify usage and parameters for searching multiple databases, including ChEBI, PDB, PubChem, and UniProt.
Adds the 'tool_choice: "required"' parameter to the planToolCalls function's OpenAI API call to ensure that a tool is always selected during reasoning.
@JR-1991 JR-1991 requested a review from Copilot September 10, 2025 06:52
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds enhanced database search functionality to LLM tools and improves PDF upload capabilities with page selection. The main purpose is to enable AI assistants to search biological and chemical databases and allow users to upload specific pages from PDF documents.

  • Added new database search tools supporting ChEBI, PDB, PubChem, and UniProt with integrated LLM function calling
  • Enhanced PDF upload functionality with page selection and validation using pdf-lib
  • Implemented comprehensive tool chain execution with parallel processing, retries, and progress tracking

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/tools.ts New module providing database search tools and OpenAI function specifications
src/llm.ts Enhanced with tool chain execution, event tracking, and integrated database search capabilities
src/input-types.ts Added page selection support to PDFUpload class with validation and cleanup methods
src/index.ts Exported new ToolChainEvent type for external consumption
package.json Added dependencies for queue management, retry logic, timeouts, and PDF processing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Adds exports for SearchDatabaseTool and SearchDatabaseToolSpecs from the tools module to make them available for external usage.
Eliminated the cleanup() method and related documentation from the PDFUpload class, as temporary file management is no longer required. Also refactored stream import to use ES module syntax.
Replaces the default value of the 'tools' parameter in extractData with a configurable value, allowing callers to specify their own tool list instead of always using SearchDatabaseToolSpecs.
Cleaned up import statements in llm.ts by removing unused UserQuery and SearchDatabaseToolSpecs imports to improve code clarity.
Replaces the previous Tool array with a new ToolDefinition type that pairs tool specs with their handler functions. Updates the tool chain logic and SearchDatabaseTool implementation to use this structure, improving extensibility and clarity in tool management.
The SearchDatabaseToolSpecs export was removed from src/index.ts as it is no longer needed.
Updated the package version to 1.5.0 and removed the 'example' script from the npm scripts section.
@JR-1991 JR-1991 merged commit 5599250 into master Sep 10, 2025
1 check passed
@github-project-automation github-project-automation bot moved this to Done in EnzymeML Sep 10, 2025
@JR-1991 JR-1991 deleted the toolchain branch September 10, 2025 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants