Feat/litellm llmclient abstraction #107

nkanu17 · 2026-01-06T13:37:15Z

Summary

This PR replaces the custom LLM wrapper classes in llms.py with a unified LLMClient abstraction layer backed by LiteLLM. All LLM operations—chat completions, embeddings, and LangChain Embeddings instances—now go through a single entry point, reducing maintenance burden and enabling support for multiple providers without code changes.

Issue: #105

Problem

The previous llms.py module had several issues:

Issue	Impact
Multiple wrapper classes (`OpenAIClientWrapper`, `AnthropicClientWrapper`, `BedrockClientWrapper`)	Duplicated logic, maintenance overhead
Manual `MODEL_NAME_MAP` dictionary	Required updates for every new model
Different response parsing per provider	Inconsistent error handling
Adding a provider = new wrapper class	High friction for expansion

Solution

Replace custom wrappers with LLMClient, a facade backed by LiteLLM:

Before	After
3 wrapper classes	1 unified client
Manual provider detection	Automatic from model name
Custom response parsing	Standardized dataclasses
Adding provider = new class	Adding provider = just use it

Why this reduces maintenance:

Provider API changes are isolated - LiteLLM handles SDK updates; we just bump the dependency
Zero-effort provider expansion - Gemini, Cohere, Ollama work without code changes
Centralized bug fixes - One place for logging, retries, error handling
Backend swappability - LLMBackend Protocol allows replacing LiteLLM without touching application code

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              CALL SITES                                     │
│  extraction.py | summarization.py | memory_strategies.py | api.py | main.py│
│                                                                             │
│     await LLMClient.create_chat_completion(model, messages, ...)            │
│     await LLMClient.create_embedding(model, input_texts, ...)               │
│     LLMClient.create_embeddings()  → LangChain Embeddings instance          │
│     LLMClient.get_model_config(model_name)                                  │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            LLMClient                                        │
│                   (Facade - Public API, Never Changes)                      │
│                                                                             │
│  create_chat_completion() → ChatCompletionResponse                          │
│  create_embedding()       → EmbeddingResponse                               │
│  create_embeddings()      → LangChain Embeddings (for vector stores)        │
│  get_model_config()       → ModelConfig                                     │
│  set_backend() / reset()  → For testing                                     │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                     Internal: LiteLLM (hidden, swappable)                   │
│                                                                             │
│  acompletion() | aembedding() | get_model_info()                            │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  OpenAI  │  Anthropic  │  AWS Bedrock  │  Gemini  │  Ollama  │  100+ more   │
└─────────────────────────────────────────────────────────────────────────────┘

Key Design Decisions:

Static class methods - No instantiation; call LLMClient.create_chat_completion() directly
Standardized responses - ChatCompletionResponse and EmbeddingResponse dataclasses
Protocol-based testing - LLMBackend Protocol for mock injection via set_backend()
Exception hierarchy - LLMClientError, ModelValidationError, APIKeyMissingError
Model config resolution - MODEL_CONFIGS → LiteLLM get_model_info() → fallback

Changes

New:

agent_memory_server/llm/ package:
- client.py - LLMClient with create_chat_completion(), create_embedding(), create_embeddings(), get_model_config(), optimize_query()
- types.py - ChatCompletionResponse, EmbeddingResponse, LLMBackend Protocol
- exceptions.py - LLMClientError, ModelValidationError, APIKeyMissingError
- __init__.py - Re-exports for clean imports

Removed:

agent_memory_server/llms.py

Updated:

main.py - Startup validation via LLMClient.get_model_config()
extraction.py - LLMClient.create_chat_completion()
summarization.py - LLMClient.create_chat_completion()
memory_strategies.py - LLMClient.create_chat_completion()
long_term_memory.py - LLMClient.create_embedding()
api.py - LLMClient.get_model_config()
vectorstore_factory.py - Delegates to LLMClient.create_embeddings()
All affected test files

Migration Guide

# Before
from agent_memory_server.llms import get_model_client
client = await get_model_client(model_name)
response = await client.create_chat_completion(model=model_name, prompt=messages)
text = response.choices[0].message.content

# After
from agent_memory_server.llm import LLMClient
response = await LLMClient.create_chat_completion(model=model_name, messages=messages)
text = response.content

# Embeddings - Before
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Embeddings - After
from agent_memory_server.llm import LLMClient
embeddings = LLMClient.create_embeddings()  # Uses configured provider

# Custom endpoints / gateways
response = await LLMClient.create_chat_completion(
    model="gpt-4o",
    messages=messages,
    api_base="https://your-gateway.example.com",
    api_key="your-gateway-key"
)

Breaking Changes

None. This is an internal refactor. External APIs remain unchanged.

Testing

All tests pass
Added tests/test_llm_client.py with unit tests for the new client

Future Work

Option: Move to RedisVL Directly

Currently, langchain_redis.RedisVectorStore owns embedding calls—it expects an Embeddings object and calls embed_query()/embed_documents() internally. Moving to RedisVL directly would give us full control:

# Current: LangChain controls embedding
vectorstore = RedisVectorStore(embeddings=LLMClient.create_embeddings(), ...)
vectorstore.aadd_documents(docs)  # LangChain embeds internally

# Future: We control embedding
vectors = await LLMClient.create_embedding(model, texts)
query = VectorQuery(vector=vectors.embeddings[0], ...)
results = await index.query(query)

Trade-offs:

Aspect	LiteLLM (LLMClient)	RedisVL Vectorizers
Providers	100+	~6 (OpenAI, Azure, Cohere, HuggingFace, VertexAI, Mistral)
Already in codebase	Yes	New dependency path
Unified interface	Single `LLMClient`	Separate vectorizer classes
Async support	Native	Varies

Recommendation: If we migrate to RedisVL, continue using LLMClient.create_embedding() rather than RedisVL vectorizers—broader provider support and the abstraction is already built.

nkanu17 · 2026-01-06T13:37:36Z

We use langchain_redis.RedisVectorStore for the embedding calls. It expects the langchain Embeddings object and calls embed_query()/embed_documents() internally.

If we move away from LangChain VectorStore to RedisVL directly, we can use either LiteLLM (via LLMClient.create_embedding()) or RedisVL's built-in vectorizers for embeddings. Since the repo now has LLMClient built as a unified abstraction with native async support, it would be nice to stick with LiteLLM. It gives broader provider coverage without introducing another embedding interface into the codebase.

For now this is not implemented.

Copilot

Pull request overview

This PR replaces custom LLM wrapper classes with a unified LLMClient abstraction layer backed by LiteLLM, significantly reducing maintenance burden and enabling support for multiple providers without code changes.

Key Changes:

Introduced new agent_memory_server/llm/ package with LLMClient facade class
Removed agent_memory_server/llms.py (454 lines) and replaced with cleaner abstraction
Updated all LLM-consuming modules to use LLMClient.create_chat_completion() and LLMClient.create_embedding()

Reviewed changes

Copilot reviewed 67 out of 75 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`agent_memory_server/llm/client.py`	New LLMClient facade with chat completion, embedding, and query optimization methods
`agent_memory_server/llm/types.py`	Standardized response dataclasses (ChatCompletionResponse, EmbeddingResponse)
`agent_memory_server/llm/exceptions.py`	Custom exception hierarchy for LLM operations
`agent_memory_server/llm/__init__.py`	Clean re-exports for package consumers
`agent_memory_server/main.py`	Updated model validation to use LLMClient.get_model_config()
`agent_memory_server/summarization.py`	Removed client parameter, uses LLMClient directly
`agent_memory_server/memory_strategies.py`	Updated to use LLMClient.create_chat_completion()
`agent_memory_server/long_term_memory.py`	Removed llm_client parameters throughout
`agent_memory_server/vectorstore_factory.py`	Delegates embedding creation to LLMClient.create_embeddings()
`agent_memory_server/extraction.py`	Updated to use LLMClient.create_chat_completion()
`agent_memory_server/api.py`	Uses LLMClient.get_model_config() for model configuration
`tests/test_llm_client.py`	New comprehensive test suite for LLMClient
`uv.lock`, `pyproject.toml`	Added litellm>=1.80.11 dependency
Multiple test files	Updated mocks to use ChatCompletionResponse dataclass
Java client files	Whitespace cleanup (trailing whitespace removal)
Documentation files	Updated references from `llms.py` to `llm` package

The PR is well-structured and follows good software engineering practices with proper abstraction, testing, and documentation updates. No critical issues found.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_llm_client.py

- Create unified LLMClient class with static methods for all LLM operations - Add async create_chat_completion() using litellm.acompletion() - Add async create_embedding() using litellm.aembedding() - Add get_model_config() with fallback chain: MODEL_CONFIGS → LiteLLM → defaults - Add optimize_query() for vector search query optimization - Define standardized ChatCompletionResponse and EmbeddingResponse dataclasses - Add LLMBackend Protocol for test injection - Add custom exception hierarchy: - LLMClientError (base) - ModelValidationError - APIKeyMissingError

- Remove llms.py module (replaced by llm_client.py) - Update api.py to use LLMClient.get_model_config() - Update extraction.py to use LLMClient.create_chat_completion() - Update long_term_memory.py to use LLMClient - Update memory_strategies.py to use LLMClient - Update summarization.py to use LLMClient BREAKING CHANGE: llms.py module removed, use llm_client.py instead

- Use LLMClient.get_model_config() for model resolution - Add custom exception handling with ModelValidationError and APIKeyMissingError - Validate API keys based on resolved provider - Improve error messages with abstracted terminology

- Add test_llm_client.py with unit tests for LLMClient - Update conftest.py with LLMClient mock fixtures - Update test_llms.py imports for compatibility

- Update test_api.py imports - Update test_contextual_grounding.py imports - Update test_contextual_grounding_integration.py imports - Update test_llm_judge_evaluation.py imports - Update test_long_term_memory.py imports - Update test_memory_compaction.py imports - Update test_memory_strategies.py imports - Update test_no_worker_mode.py imports - Update test_query_optimization_errors.py imports - Update test_summarization.py imports

- Update pyproject.toml with any new dependencies - Sync uv.lock file

- Add LLMClient documentation for AI assistant context - Update project structure references

- Exclude dev_docs directory from version control - Keep development documentation local only

Split the monolithic llm_client.py into a proper package structure: - llm/exceptions.py: LLMClientError, ModelValidationError, APIKeyMissingError - llm/types.py: ChatCompletionResponse, EmbeddingResponse, LLMBackend Protocol - llm/client.py: LLMClient class with all methods - llm/__init__.py: Re-exports for clean imports This improves code organization and makes it easier to: - Find specific components (exceptions, types, client) - Test individual modules - Extend the package with new backends

Update all application code imports from: from agent_memory_server.llm_client import ... to: from agent_memory_server.llm import ... Files updated: - api.py - extraction.py - long_term_memory.py - main.py - memory_strategies.py - summarization.py

Update all test file imports from: from agent_memory_server.llm_client import ... to: from agent_memory_server.llm import ... Update mock patch paths from: agent_memory_server.llm_client.LLMClient to: agent_memory_server.llm.client.LLMClient Also removed duplicate mock_llm_client fixture in conftest.py.

- Add LLMClient.create_embeddings() classmethod that returns LangChain Embeddings - Extract _create_openai_embeddings() and _create_bedrock_embeddings() helpers - Update vectorstore_factory.create_embeddings() to delegate to LLMClient - Update tests to patch agent_memory_server.config.settings - Add Embeddings to TYPE_CHECKING imports for proper type hints This makes LLMClient the single entry point for all LLM operations: - create_chat_completion() for chat - create_embedding() for async embedding vectors - create_embeddings() for LangChain Embeddings instances

- Fix test_memory_prompt_with_model_name to patch agent_memory_server.api.LLMClient.get_model_config - Remove commented-out code from tests/conftest.py, tests/test_long_term_memory.py, tests/test_llm_client.py

vishal-bala

I'd be curious to know your thinking behind opting for the Client/Backend abstraction as opposed to a general Client abstract base class (that has a LiteLLM implementation subclass). I think both approaches can work here, but I feel like the single global LLMClient object with a reconfigurable LLMBackend comes with some potential Python scoping gotchas.

This was more of a higher-level look through - I saw that the LLMClient currently is directly calling LiteLLM. Is that intentional and meant to be abstracted in the future, or actually part of the design?

vishal-bala · 2026-01-07T15:18:53Z

agent_memory_server/llm/client.py

+        # Merge provider-specific kwargs
+        call_kwargs.update(kwargs)
+
+        response = await acompletion(**call_kwargs)


If LiteLLM is supposed to be the backend to the agnostic LLMClient wouldn't we rather have a generic call to an LLMBackend method here rather than locking in LiteLLM?

LLMClient (facade) ├── _backend: LLMBackend | None # Only set in tests │ ├── create_chat_completion() │ if _backend: use it (tests) │ else: call litellm.acompletion() directly │ └── create_embedding() if _backend: use it (tests) else: call litellm.aembedding() directly

The _backend is only for tests, we wouldnt swap LiteLLM for something else. The protocol lets tests inject mocks without hitting real APIs.

I did consider what you said, we'd only want a different backend if we wanted to swap LiteLLM entirely for something else like:

Direct SDK calls (openai, anthropic, boto3)

A different unified library/custom gateway client

And my thought was to revisit this when we need it. I did not want to create placeholder abstraction without a need yet.

Update: I removed the _backend -> _test_handler/LLMBackend Protocol and replaced it with standard mocks.

agent_memory_server/llm/client.py

agent_memory_server/llm/types.py

nkanu17 · 2026-01-07T16:36:00Z

I'd be curious to know your thinking behind opting for the Client/Backend abstraction as opposed to a general Client abstract base class (that has a LiteLLM implementation subclass). I think both approaches can work here, but I feel like the single global LLMClient object with a reconfigurable LLMBackend comes with some potential Python scoping gotchas.

This was more of a higher-level look through - I saw that the LLMClient currently is directly calling LiteLLM. Is that intentional and meant to be abstracted in the future, or actually part of the design?

@vishal-bala
Yea so the backend is required for tests only, allowing tests to mock LLM calls without hitting real APIs. You are also right that the _backend state could cause issues in concurrent tests. However, we mitigate this by:

Only setting _backend in tests, never in production
Using LLMClient.reset() in test teardown

Open to an alternative.

Actually, this was a good discussion - I realized the _backend and LLMBackend names are confusing. I am changing it to TestLLMHandler (which is the protocol tests should follow to mock) and _backend -> _test_handler

Update:

I removed the _backend -> _test_handler/LLMBackend Protocol and replaced it with standard mocks.

…ptions - Unknown embedding model now raises ModelValidationError instead of defaulting to OpenAI - Anthropic embedding request raises ModelValidationError (Anthropic has no embedding models) - Unsupported provider in _map_provider raises ModelValidationError instead of defaulting to OpenAI - Removed vertex_ai and gemini fallback mappings

…eModel - ChatCompletionResponse and EmbeddingResponse now use Pydantic BaseModel - Maintains frozen=True behavior via ConfigDict(frozen=True) - Aligns with project's consistent use of Pydantic models

- Update frozen tests to expect Pydantic ValidationError instead of AttributeError - Change None content test to empty string (Pydantic validates str type)

…anism - Remove TestLLMHandler Protocol from types.py - Remove _test_handler class variable from LLMClient - Remove set_test_handler() and clear_test_handler() methods - Remove conditional dispatch in create_chat_completion() and create_embedding() - Remove TestLLMHandler from __init__.py exports This eliminates test-only code paths from production code and removes global mutable state from the otherwise stateless LLMClient facade.

@patch

- Remove MockLLMHandler class and clear_llm_client_test_handler fixture - Remove TestLLMClientClearTestHandler and TestLLMClientTestHandlerInjection classes - Refactor TestLLMClientChatCompletion to use @patch on acompletion - Refactor TestLLMClientEmbedding to use @patch on aembedding - Add helper functions for creating mock LiteLLM responses This aligns LLM client tests with the standard mocking patterns used throughout the rest of the test suite.

- test_create_embeddings_unsupported_provider: expect ModelValidationError instead of ValueError to match actual exception raised by LLMClient - test_optimization_with_none_content_response: mock at LiteLLM level instead of creating invalid ChatCompletionResponse with content=None

Replace manual single-quote wrapping with Python's !r (repr) format specifier for more idiomatic string formatting that also handles edge cases like values containing quotes. Changed locations: - Unknown embedding model error message - Unsupported LiteLLM provider error message - Model not found warning message - Query optimization debug/warning messages

abrookins

Looks amazing! LGTM 👍

- Add TestLLMClientIntegrationHuggingFace class with tests for: - Chat completion via huggingface/meta-llama/Llama-3.3-70B-Instruct - Embeddings via huggingface/microsoft/codebert-base - Add TestLLMClientIntegrationGemini class with test for: - Chat completion via gemini/gemini-1.5-flash - Tests require HF_TOKEN and GEMINI_API_KEY respectively - Tests skip gracefully when API keys are not set - Run with: uv run pytest tests/test_llm_client.py --run-api-tests

nkanu17 marked this pull request as ready for review January 6, 2026 13:41

Copilot AI review requested due to automatic review settings January 6, 2026 13:41

Copilot started reviewing on behalf of nkanu17 January 6, 2026 13:42 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

tests/test_llm_client.py Outdated Show resolved Hide resolved

nkanu17 added 15 commits January 6, 2026 17:44

test(llm): add LLMClient tests and update test fixtures

71260cc

- Add test_llm_client.py with unit tests for LLMClient - Update conftest.py with LLMClient mock fixtures - Update test_llms.py imports for compatibility

build(deps): update dependencies for LLMClient

6ec7da9

- Update pyproject.toml with any new dependencies - Sync uv.lock file

docs: update CLAUDE.md with LLMClient context

63c3384

- Add LLMClient documentation for AI assistant context - Update project structure references

chore: add dev_docs to .gitignore

bc23df0

- Exclude dev_docs directory from version control - Keep development documentation local only

Update docs for query optimization

c189174

docs: update project structure for llm/ package

ca10332

fix: patch correct LLMClient.get_model_config path in test

4f602fe

- Fix test_memory_prompt_with_model_name to patch agent_memory_server.api.LLMClient.get_model_config - Remove commented-out code from tests/conftest.py, tests/test_long_term_memory.py, tests/test_llm_client.py

nkanu17 requested review from abrookins and vishal-bala January 7, 2026 13:18

vishal-bala reviewed Jan 7, 2026

View reviewed changes

Fix test to use new LLMClient pattern after merge with main

93b64af

abrookins force-pushed the feat/litellm-llmclient-abstraction branch from b6db891 to 93b64af Compare January 7, 2026 17:04

nkanu17 added 4 commits January 7, 2026 19:40

refactor: convert LLM response types from dataclasses to Pydantic Bas…

7e805a1

…eModel - ChatCompletionResponse and EmbeddingResponse now use Pydantic BaseModel - Maintains frozen=True behavior via ConfigDict(frozen=True) - Aligns with project's consistent use of Pydantic models

test: update tests for Pydantic model conversion

8bbbae9

- Update frozen tests to expect Pydantic ValidationError instead of AttributeError - Change None content test to empty string (Pydantic validates str type)

nkanu17 added 3 commits January 7, 2026 21:49

abrookins approved these changes Jan 8, 2026

View reviewed changes

nkanu17 merged commit 4baa5a7 into main Jan 8, 2026
18 checks passed

This was referenced Jan 8, 2026

Ollama Support #63

Closed

OpenRouter Support #80

Closed

Integrate unified LLMClient with LiteLLM #105

Closed

Feat/litellm llmclient abstraction #107

Feat/litellm llmclient abstraction #107

Conversation

nkanu17 commented Jan 6, 2026

Summary

Problem

Solution

Architecture

Changes

Migration Guide

Breaking Changes

Testing

Future Work

Option: Move to RedisVL Directly

Uh oh!

nkanu17 commented Jan 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

vishal-bala left a comment

Choose a reason for hiding this comment

Uh oh!

vishal-bala Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

nkanu17 Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkanu17 Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkanu17 Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abrookins left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nkanu17 Jan 7, 2026 •

edited

Loading

nkanu17 Jan 7, 2026 •

edited

Loading

nkanu17 commented Jan 7, 2026 •

edited

Loading