-
Notifications
You must be signed in to change notification settings - Fork 30
Feat/litellm llmclient abstraction #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We use langchain_redis.RedisVectorStore for the embedding calls. It expects the langchain Embeddings object and calls embed_query()/embed_documents() internally. If we move away from LangChain VectorStore to RedisVL directly, we can use either LiteLLM (via LLMClient.create_embedding()) or RedisVL's built-in vectorizers for embeddings. Since the repo now has LLMClient built as a unified abstraction with native async support, it would be nice to stick with LiteLLM. It gives broader provider coverage without introducing another embedding interface into the codebase. For now this is not implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR replaces custom LLM wrapper classes with a unified LLMClient abstraction layer backed by LiteLLM, significantly reducing maintenance burden and enabling support for multiple providers without code changes.
Key Changes:
- Introduced new
agent_memory_server/llm/package withLLMClientfacade class - Removed
agent_memory_server/llms.py(454 lines) and replaced with cleaner abstraction - Updated all LLM-consuming modules to use
LLMClient.create_chat_completion()andLLMClient.create_embedding()
Reviewed changes
Copilot reviewed 67 out of 75 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
agent_memory_server/llm/client.py |
New LLMClient facade with chat completion, embedding, and query optimization methods |
agent_memory_server/llm/types.py |
Standardized response dataclasses (ChatCompletionResponse, EmbeddingResponse) |
agent_memory_server/llm/exceptions.py |
Custom exception hierarchy for LLM operations |
agent_memory_server/llm/__init__.py |
Clean re-exports for package consumers |
agent_memory_server/main.py |
Updated model validation to use LLMClient.get_model_config() |
agent_memory_server/summarization.py |
Removed client parameter, uses LLMClient directly |
agent_memory_server/memory_strategies.py |
Updated to use LLMClient.create_chat_completion() |
agent_memory_server/long_term_memory.py |
Removed llm_client parameters throughout |
agent_memory_server/vectorstore_factory.py |
Delegates embedding creation to LLMClient.create_embeddings() |
agent_memory_server/extraction.py |
Updated to use LLMClient.create_chat_completion() |
agent_memory_server/api.py |
Uses LLMClient.get_model_config() for model configuration |
tests/test_llm_client.py |
New comprehensive test suite for LLMClient |
uv.lock, pyproject.toml |
Added litellm>=1.80.11 dependency |
| Multiple test files | Updated mocks to use ChatCompletionResponse dataclass |
| Java client files | Whitespace cleanup (trailing whitespace removal) |
| Documentation files | Updated references from llms.py to llm package |
The PR is well-structured and follows good software engineering practices with proper abstraction, testing, and documentation updates. No critical issues found.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Create unified LLMClient class with static methods for all LLM operations - Add async create_chat_completion() using litellm.acompletion() - Add async create_embedding() using litellm.aembedding() - Add get_model_config() with fallback chain: MODEL_CONFIGS → LiteLLM → defaults - Add optimize_query() for vector search query optimization - Define standardized ChatCompletionResponse and EmbeddingResponse dataclasses - Add LLMBackend Protocol for test injection - Add custom exception hierarchy: - LLMClientError (base) - ModelValidationError - APIKeyMissingError
- Remove llms.py module (replaced by llm_client.py) - Update api.py to use LLMClient.get_model_config() - Update extraction.py to use LLMClient.create_chat_completion() - Update long_term_memory.py to use LLMClient - Update memory_strategies.py to use LLMClient - Update summarization.py to use LLMClient BREAKING CHANGE: llms.py module removed, use llm_client.py instead
- Use LLMClient.get_model_config() for model resolution - Add custom exception handling with ModelValidationError and APIKeyMissingError - Validate API keys based on resolved provider - Improve error messages with abstracted terminology
- Add test_llm_client.py with unit tests for LLMClient - Update conftest.py with LLMClient mock fixtures - Update test_llms.py imports for compatibility
- Update test_api.py imports - Update test_contextual_grounding.py imports - Update test_contextual_grounding_integration.py imports - Update test_llm_judge_evaluation.py imports - Update test_long_term_memory.py imports - Update test_memory_compaction.py imports - Update test_memory_strategies.py imports - Update test_no_worker_mode.py imports - Update test_query_optimization_errors.py imports - Update test_summarization.py imports
- Update pyproject.toml with any new dependencies - Sync uv.lock file
- Add LLMClient documentation for AI assistant context - Update project structure references
- Exclude dev_docs directory from version control - Keep development documentation local only
Split the monolithic llm_client.py into a proper package structure: - llm/exceptions.py: LLMClientError, ModelValidationError, APIKeyMissingError - llm/types.py: ChatCompletionResponse, EmbeddingResponse, LLMBackend Protocol - llm/client.py: LLMClient class with all methods - llm/__init__.py: Re-exports for clean imports This improves code organization and makes it easier to: - Find specific components (exceptions, types, client) - Test individual modules - Extend the package with new backends
Update all application code imports from: from agent_memory_server.llm_client import ... to: from agent_memory_server.llm import ... Files updated: - api.py - extraction.py - long_term_memory.py - main.py - memory_strategies.py - summarization.py
Update all test file imports from: from agent_memory_server.llm_client import ... to: from agent_memory_server.llm import ... Update mock patch paths from: agent_memory_server.llm_client.LLMClient to: agent_memory_server.llm.client.LLMClient Also removed duplicate mock_llm_client fixture in conftest.py.
- Add LLMClient.create_embeddings() classmethod that returns LangChain Embeddings - Extract _create_openai_embeddings() and _create_bedrock_embeddings() helpers - Update vectorstore_factory.create_embeddings() to delegate to LLMClient - Update tests to patch agent_memory_server.config.settings - Add Embeddings to TYPE_CHECKING imports for proper type hints This makes LLMClient the single entry point for all LLM operations: - create_chat_completion() for chat - create_embedding() for async embedding vectors - create_embeddings() for LangChain Embeddings instances
- Fix test_memory_prompt_with_model_name to patch agent_memory_server.api.LLMClient.get_model_config - Remove commented-out code from tests/conftest.py, tests/test_long_term_memory.py, tests/test_llm_client.py
vishal-bala
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be curious to know your thinking behind opting for the Client/Backend abstraction as opposed to a general Client abstract base class (that has a LiteLLM implementation subclass). I think both approaches can work here, but I feel like the single global LLMClient object with a reconfigurable LLMBackend comes with some potential Python scoping gotchas.
This was more of a higher-level look through - I saw that the LLMClient currently is directly calling LiteLLM. Is that intentional and meant to be abstracted in the future, or actually part of the design?
| # Merge provider-specific kwargs | ||
| call_kwargs.update(kwargs) | ||
|
|
||
| response = await acompletion(**call_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If LiteLLM is supposed to be the backend to the agnostic LLMClient wouldn't we rather have a generic call to an LLMBackend method here rather than locking in LiteLLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLMClient (facade)
├── _backend: LLMBackend | None # Only set in tests
│
├── create_chat_completion()
│ if _backend: use it (tests)
│ else: call litellm.acompletion() directly
│
└── create_embedding()
if _backend: use it (tests)
else: call litellm.aembedding() directly
The _backend is only for tests, we wouldnt swap LiteLLM for something else. The protocol lets tests inject mocks without hitting real APIs.
I did consider what you said, we'd only want a different backend if we wanted to swap LiteLLM entirely for something else like:
- Direct SDK calls (openai, anthropic, boto3)
- A different unified library/custom gateway client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And my thought was to revisit this when we need it. I did not want to create placeholder abstraction without a need yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I removed the _backend -> _test_handler/LLMBackend Protocol and replaced it with standard mocks.
@vishal-bala
Open to an alternative. Actually, this was a good discussion - I realized the _backend and LLMBackend names are confusing. I am changing it to TestLLMHandler (which is the protocol tests should follow to mock) and Update: I removed the |
b6db891 to
93b64af
Compare
…ptions - Unknown embedding model now raises ModelValidationError instead of defaulting to OpenAI - Anthropic embedding request raises ModelValidationError (Anthropic has no embedding models) - Unsupported provider in _map_provider raises ModelValidationError instead of defaulting to OpenAI - Removed vertex_ai and gemini fallback mappings
…eModel - ChatCompletionResponse and EmbeddingResponse now use Pydantic BaseModel - Maintains frozen=True behavior via ConfigDict(frozen=True) - Aligns with project's consistent use of Pydantic models
- Update frozen tests to expect Pydantic ValidationError instead of AttributeError - Change None content test to empty string (Pydantic validates str type)
…anism - Remove TestLLMHandler Protocol from types.py - Remove _test_handler class variable from LLMClient - Remove set_test_handler() and clear_test_handler() methods - Remove conditional dispatch in create_chat_completion() and create_embedding() - Remove TestLLMHandler from __init__.py exports This eliminates test-only code paths from production code and removes global mutable state from the otherwise stateless LLMClient facade.
- Remove MockLLMHandler class and clear_llm_client_test_handler fixture - Remove TestLLMClientClearTestHandler and TestLLMClientTestHandlerInjection classes - Refactor TestLLMClientChatCompletion to use @patch on acompletion - Refactor TestLLMClientEmbedding to use @patch on aembedding - Add helper functions for creating mock LiteLLM responses This aligns LLM client tests with the standard mocking patterns used throughout the rest of the test suite.
- test_create_embeddings_unsupported_provider: expect ModelValidationError instead of ValueError to match actual exception raised by LLMClient - test_optimization_with_none_content_response: mock at LiteLLM level instead of creating invalid ChatCompletionResponse with content=None
Replace manual single-quote wrapping with Python's !r (repr) format specifier for more idiomatic string formatting that also handles edge cases like values containing quotes. Changed locations: - Unknown embedding model error message - Unsupported LiteLLM provider error message - Model not found warning message - Query optimization debug/warning messages
abrookins
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks amazing! LGTM 👍
- Add TestLLMClientIntegrationHuggingFace class with tests for: - Chat completion via huggingface/meta-llama/Llama-3.3-70B-Instruct - Embeddings via huggingface/microsoft/codebert-base - Add TestLLMClientIntegrationGemini class with test for: - Chat completion via gemini/gemini-1.5-flash - Tests require HF_TOKEN and GEMINI_API_KEY respectively - Tests skip gracefully when API keys are not set - Run with: uv run pytest tests/test_llm_client.py --run-api-tests
Summary
This PR replaces the custom LLM wrapper classes in
llms.pywith a unifiedLLMClientabstraction layer backed by LiteLLM. All LLM operations—chat completions, embeddings, and LangChain Embeddings instances—now go through a single entry point, reducing maintenance burden and enabling support for multiple providers without code changes.Issue: #105
Problem
The previous
llms.pymodule had several issues:OpenAIClientWrapper,AnthropicClientWrapper,BedrockClientWrapper)MODEL_NAME_MAPdictionarySolution
Replace custom wrappers with
LLMClient, a facade backed by LiteLLM:Why this reduces maintenance:
LLMBackendProtocol allows replacing LiteLLM without touching application codeArchitecture
Key Design Decisions:
LLMClient.create_chat_completion()directlyChatCompletionResponseandEmbeddingResponsedataclassesLLMBackendProtocol for mock injection viaset_backend()LLMClientError,ModelValidationError,APIKeyMissingErrorMODEL_CONFIGS→ LiteLLMget_model_info()→ fallbackChanges
New:
agent_memory_server/llm/package:client.py-LLMClientwithcreate_chat_completion(),create_embedding(),create_embeddings(),get_model_config(),optimize_query()types.py-ChatCompletionResponse,EmbeddingResponse,LLMBackendProtocolexceptions.py-LLMClientError,ModelValidationError,APIKeyMissingError__init__.py- Re-exports for clean importsRemoved:
agent_memory_server/llms.pyUpdated:
main.py- Startup validation viaLLMClient.get_model_config()extraction.py-LLMClient.create_chat_completion()summarization.py-LLMClient.create_chat_completion()memory_strategies.py-LLMClient.create_chat_completion()long_term_memory.py-LLMClient.create_embedding()api.py-LLMClient.get_model_config()vectorstore_factory.py- Delegates toLLMClient.create_embeddings()Migration Guide
Breaking Changes
None. This is an internal refactor. External APIs remain unchanged.
Testing
tests/test_llm_client.pywith unit tests for the new clientFuture Work
Option: Move to RedisVL Directly
Currently,
langchain_redis.RedisVectorStoreowns embedding calls—it expects anEmbeddingsobject and callsembed_query()/embed_documents()internally. Moving to RedisVL directly would give us full control:Trade-offs:
LLMClientRecommendation: If we migrate to RedisVL, continue using
LLMClient.create_embedding()rather than RedisVL vectorizers—broader provider support and the abstraction is already built.