AbstractCore provides a unified interface to major LLM providers with production-oriented reliability features. This document explains how it works internally and why it's designed this way.
If you're new to AbstractCore and want to start building quickly, read:
docs/getting-started.mddocs/api.md
Related docs (user-facing):
- Media inputs (images/audio/video + documents):
docs/media-handling-system.md - Vision input + fallback:
docs/vision-capabilities.md - Capability plugins (voice/audio/vision):
docs/capabilities.md - OpenAI-compatible gateway server:
docs/server.md - Single-model OpenAI-compatible endpoint:
docs/endpoint.md - Tool calling semantics (passthrough vs execution):
docs/tool-calling.md
AbstractCore operates as a Python library and can also be exposed via optional OpenAI-compatible HTTP servers:
- Gateway server (multi-provider):
abstractcore.server.app(docs:docs/server.md) - Endpoint server (single-model):
abstractcore.endpoint.app(docs:docs/endpoint.md)
graph TD
A[Your Application] --> B[AbstractCore API]
AA[HTTP Clients] --> BB[AbstractCore Server]
BB --> B
B --> C[Provider Interface]
C --> D[Event System]
C --> E[Tool System]
C --> F[Retry System]
C --> G[Provider Implementations]
G --> H[OpenAI Provider]
G --> HH[OpenAI-Compatible Provider]
G --> I[Anthropic Provider]
G --> J[Ollama Provider]
G --> K[MLX Provider]
G --> L[LMStudio Provider]
G --> M[HuggingFace Provider]
G --> MM[vLLM Provider]
G --> MN[OpenRouter Provider]
G --> MP[Portkey Provider]
H --> N[OpenAI API]
HH --> NN[OpenAI-Compatible /v1 Endpoint]
I --> O[Anthropic API]
J --> P[Ollama Server]
K --> Q[MLX Models]
L --> R[LMStudio Server]
M --> S[HuggingFace Models]
MM --> RR[vLLM Server]
MN --> RO[OpenRouter API]
MP --> RP[Portkey API Gateway]
style B fill:#e1f5fe
style BB fill:#4caf50
style C fill:#f3e5f5
style G fill:#fff3e0
Goal: Same interface for all providers Implementation: Common interface with provider-specific implementations
Goal: Handle real-world failures gracefully Implementation: Built-in retry logic, circuit breakers, comprehensive error handling
Goal: Tools work everywhere, even with providers that don't support them natively Implementation: Native support where available, intelligent prompting as fallback
Goal: Clean, focused API that's easy to understand Implementation: Minimal core with clear extension points
Goal: Flexible deployment as library or server Implementation: OpenAI-compatible REST API built on core library
The main entry point uses the factory pattern for clean provider instantiation:
graph LR
A[create_llm] --> B{Provider Type}
B --> C[OpenAI Provider]
B --> D[Anthropic Provider]
B --> E[Ollama Provider]
B --> F[Other Providers...]
C --> G[Configured Instance]
D --> G
E --> G
F --> G
style A fill:#4caf50
style G fill:#2196f3
from abstractcore import create_llm
# Factory creates the right provider with proper configuration
llm = create_llm("openai", model="gpt-4o-mini", temperature=0.7)
# OpenAI-compatible /v1 endpoints (LMStudio, vLLM, custom proxies)
llm_local = create_llm("lmstudio", model="qwen/qwen3-4b-2507", base_url="http://localhost:1234/v1")
llm_openrouter = create_llm("openrouter", model="openai/gpt-4o-mini") # requires OPENROUTER_API_KEY
llm_portkey = create_llm("portkey", model="gpt-4o-mini", config_id="pcfg_...") # requires PORTKEY_API_KEY + PORTKEY_CONFIGGateway providers (OpenRouter/Portkey) route to external backends; AbstractCore forwards only explicit generation parameters to avoid sending defaults that strict backends reject.
All providers implement AbstractCoreInterface (see abstractcore/core/interface.py):
class AbstractCoreInterface(ABC):
@abstractmethod
def generate(
self,
prompt: str,
messages: Optional[List[Dict[str, str]]] = None,
system_prompt: Optional[str] = None,
tools: Optional[List[Dict[str, Any]]] = None,
media: Optional[List[Union[str, Dict[str, Any], "MediaContent"]]] = None,
stream: bool = False,
thinking: Optional[Union[bool, str]] = None,
**kwargs,
) -> Union[GenerateResponse, Iterator[GenerateResponse]]:
"""Generate a response (or a stream of chunks)."""
@abstractmethod
def get_capabilities(self) -> List[str]:
"""Get provider capabilities"""
@abstractmethod
def unload_model(self, model_name: str) -> None:
"""Unload/cleanup resources for a specific model (best-effort)."""This ensures:
- Consistency: Same methods across all providers
- Reliability: Standardized error handling
- Extensibility: Easy to add new providers
- Memory Management: Explicit control over model lifecycle
BaseProvider also applies asset-driven response normalization so downstream code sees clean, consistent output across providers:
- Output wrappers: Strip configured leading/trailing wrapper tokens (e.g., GLM
<|begin_of_box|>…<|end_of_box|>) - Harmony transcripts (GPT-OSS): Extract
<|channel|>finalintoGenerateResponse.contentand capture<|channel|>analysisasGenerateResponse.metadata["reasoning"](non-streaming) - Thinking tags: Extract inline
<think>...</think>blocks intoGenerateResponse.metadata["reasoning"](when configured)
Why this belongs in BaseProvider (even for streaming):
- These artifacts are model/template-specific, not provider-specific (the same model can be served via Ollama, vLLM, LMStudio, HF, or MLX)
- In streaming mode, wrappers often appear in the first/last chunks; stripping them incrementally avoids leaking markup into UIs and tool parsers without buffering the full response
Configuration comes from abstractcore/assets/architecture_formats.json and abstractcore/assets/model_capabilities.json; implementation lives in abstractcore/architectures/response_postprocessing.py.
AbstractCore's model capability routing and architecture formatting are driven by two canonical JSON registries:
abstractcore/assets/model_capabilities.json— model limits, tool/structured output flags, multimodal support, aliasesabstractcore/assets/architecture_formats.json— message formats, tool call syntax, response wrappers, detection patterns
When a new model or architecture is released (or an existing one changes), update these files first. See abstractcore/assets/README.md for field requirements and update rules.
The unload_model(model_name) method is a best-effort resource cleanup hook.
- API providers (OpenAI, Anthropic): typically a no-op (safe to call).
- Local / self-hosted providers: behavior is provider-specific:
- some can actively release memory (or request server-side eviction),
- others can only close client connections and rely on server-side TTL/auto-eviction.
- Example: LMStudio does not expose an explicit “unload model” API;
unload_model()closes HTTP clients and relies on LMStudio TTL/auto-evict.
In the OpenAI-compatible AbstractCore server (abstractcore.server.app), requests can set unload_after (default false)
to call llm.unload_model(model) after the request completes. For providers that can unload shared server state (e.g. Ollama),
this is disabled by default and must be explicitly enabled by the server operator.
# Load model, use it, then free memory
llm = create_llm("ollama", model="large-model")
response = llm.generate("Hello")
llm.unload_model(llm.model) # Explicitly free memory
del llmThis is critical for:
- Test suites that load multiple models sequentially
- Memory-constrained environments (<32GB RAM)
- Production systems serving different models sequentially
AbstractCore includes a policy-driven media handling system that enables file attachments across all providers:
graph TD
A[User Input: @file.pdf] --> B[MessagePreprocessor]
B --> C[Extract Files + Clean Text]
C --> D[AutoMediaHandler]
D --> E{File Type Detection}
E -->|Images| F[ImageProcessor]
E -->|PDFs| G[PDFProcessor]
E -->|Office| H[OfficeProcessor]
E -->|Text/CSV| I[TextProcessor]
F --> J[MediaContent Objects]
G --> J
H --> J
I --> J
J --> K{Provider Type}
K -->|OpenAI| L[OpenAI Format]
K -->|Anthropic| M[Anthropic Format]
K -->|Local| N[Text Embedding]
L --> O[Provider API Call]
M --> O
N --> O
style D fill:#4caf50
style J fill:#2196f3
style O fill:#ff9800
Core Components:
- MessagePreprocessor: Parses
@filenamesyntax in CLI and extracts file references - AutoMediaHandler: Intelligent coordinator that selects appropriate processors
- Specialized Processors:
ImageProcessor(PIL-based for images)PDFProcessor(PyMuPDF4LLM for documents)OfficeProcessor(Unstructured for DOCX/XLSX/PPTX)TextProcessor(pandas for CSV/TSV data analysis)
- Provider Handlers: Format media content for each provider's API requirements
Provider-Specific Formatting:
# Same MediaContent gets formatted differently:
# OpenAI (JSON with image_url):
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
]
}
# Anthropic (Messages API with source):
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this"},
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}}
]
}
# Local (Text embedding):
"Analyze this\n\nImage description: A chart showing quarterly trends..."Graceful Fallback Strategy:
- Advanced Processing: PyMuPDF4LLM, Unstructured libraries
- Basic Processing: Simple text extraction
- Metadata Fallback: File information and properties
- Degrades gracefully for documents: PDFs/Office/text aim to return best-effort extracted text/metadata rather than crashing.
- Policy-driven for true multimodal inputs: for image/audio/video message parts, behavior is policy-driven; unsupported requests fail loudly unless an explicit enrichment fallback is configured (see
docs/media-handling-system.mdanddocs/vision-capabilities.md).
The same media=[] parameter works across all providers:
# Universal API - works with any provider
llm = create_llm("openai", model="gpt-4o") # or "anthropic", "ollama", etc.
response = llm.generate(
"Analyze these files",
media=["report.pdf", "chart.png", "data.xlsx"]
)CLI Integration:
# Simple @filename syntax works everywhere
python -m abstractcore.utils.cli --prompt "What's in @document.pdf and @image.jpg"To keep the default abstractcore install dependency-light while still enabling deterministic modality APIs, AbstractCore supports optional capability plugins:
abstractvoiceprovidescore.voice+core.audio(TTS/STT).abstractvisionprovidescore.vision(T2I/I2I/T2V/I2V; backend-pluggable).
Discovery:
llm.capabilities.status()returns a JSON-safe snapshot (which backends are available/selected, plus install hints).- Convenience facades exist as properties:
llm.voice,llm.audio,llm.vision(lazy; missing plugins raise actionable errors).
sequenceDiagram
participant App as Your App
participant Core as AbstractCore
participant Events as Event System
participant Retry as Retry Logic
participant Provider as LLM Provider
participant Tools as Tool System
App->>Core: generate("prompt", tools=tools)
Core->>Events: emit(GENERATION_STARTED)
Core->>Retry: wrap_with_retry()
alt Provider Call Success
Retry->>Provider: API call
Provider->>Retry: response
Retry->>Core: successful response
else Provider Call Fails
Retry->>Provider: API call (attempt 1)
Provider->>Retry: rate limit error
Retry->>Retry: wait with backoff
Retry->>Provider: API call (attempt 2)
Provider->>Retry: success
Retry->>Core: successful response
end
alt Has Tool Calls
Core->>Events: emit(TOOL_STARTED)
Core->>Tools: execute_tools()
Tools->>Core: tool results
Core->>Events: emit(TOOL_COMPLETED)
end
Core->>Events: emit(GENERATION_COMPLETED)
Core->>App: GenerateResponse
Note: in the Python API, execute_tools defaults to False (pass-through). Tool calls are returned in GenerateResponse.tool_calls for your host/runtime to execute. execute_tools=True exists for simple demos but is deprecated for most production use cases. The optional HTTP gateway server runs in pass-through mode.
The tool system provides universal tool-call detection (and optional local execution) across all providers:
graph TD
A[LLM Response] --> B{Has Tool Calls?}
B -->|No| C[Return Response]
B -->|Yes| D[Parse Tool Calls]
D --> E[Event: TOOL_STARTED]
E --> F{Event Prevented?}
F -->|Yes| G[Skip Tool Execution]
F -->|No| H[Execute Tools]
H --> I[Collect Results]
I --> J[Event: TOOL_COMPLETED]
J --> K[Append Results to Response]
K --> C
style D fill:#ffeb3b
style H fill:#4caf50
style E fill:#ff9800
- Tool Detection: Parse tool calls from LLM response
- Event Emission: Emit
TOOL_STARTED(preventable) - Optional local execution (deprecated): execute tools inside AbstractCore when
execute_tools=True(providers never execute arbitrary local tools) - Result Collection: Gather results and error information
- Event Emission: Emit
TOOL_COMPLETEDwith results - Response Integration: Append tool results to original response
graph LR
A[Tool Definition] --> B{Provider Type}
B --> C[OpenAI: Native JSON]
B --> D[Anthropic: Native XML]
B --> E[Ollama: Architecture-specific]
B --> F[Others: Prompted Format]
C --> G[LLM Generation]
D --> G
E --> G
F --> G
G --> H[Tool Call Tag Rewriter]
H --> I[Target Format Conversion]
I --> J[Universal Tool Parser]
J --> K[Local Tool Execution]
style A fill:#e1f5fe
style H fill:#ff9800
style I fill:#9c27b0
style K fill:#4caf50
AbstractCore includes a sophisticated tag rewriting system that enables compatibility with any agentic CLI:
Rewriting Pipeline:
graph TD
A[Raw LLM Response] --> B[Pattern Detection]
B --> C{Tag Format Needed?}
C -->|No| D[Default Qwen3 Format]
C -->|Yes| E[Target Format Conversion]
E --> F{Format Type}
F -->|Predefined| G[llama3, xml, gemma, etc.]
F -->|Custom| H[User-defined Tags]
G --> I[Rewritten Tool Call]
H --> I
D --> I
I --> J[Tool Execution]
style B fill:#2196f3
style E fill:#ff9800
style I fill:#4caf50
Supported Formats:
- Default (Qwen3):
<|tool_call|>...JSON...</|tool_call|>- Compatible with Codex CLI - LLaMA3:
<function_call>...JSON...</function_call>- Compatible with Crush CLI - XML:
<tool_call>...JSON...</tool_call>- Compatible with Gemini CLI - Gemma:
tool_code...JSON...- Compatible with Gemma models - Custom: Any user-defined format (e.g.,
[TOOL]...JSON...[/TOOL])
Real-Time Integration:
- Streaming Compatible: Works seamlessly with unified streaming architecture
- Zero Latency: No additional processing delays
- Universal Detection: Automatically detects source format from any model
- Graceful Fallback: Returns original content if rewriting fails
Production-grade error handling with multiple layers:
graph TD
A[LLM Request] --> B[Retry Manager]
B --> C{Error Type}
C -->|Rate Limit| D[Exponential Backoff]
C -->|Network Error| D
C -->|Timeout| D
C -->|Auth Error| E[Fail Fast]
C -->|Invalid Request| E
D --> F{Max Attempts?}
F -->|No| G[Wait + Jitter]
G --> H[Retry Request]
H --> B
F -->|Yes| I[Circuit Breaker]
I --> J{Failure Threshold?}
J -->|No| K[Return Error]
J -->|Yes| L[Open Circuit]
L --> M[Fail Fast for Duration]
style D fill:#ff9800
style I fill:#f44336
style L fill:#d32f2f
from abstractcore import create_llm
from abstractcore.core.retry import RetryConfig
config = RetryConfig(
max_attempts=3, # Try up to 3 times
initial_delay=1.0, # Start with 1 second delay
max_delay=60.0, # Cap at 1 minute
use_jitter=True, # Add randomness
failure_threshold=5, # Circuit breaker after 5 failures
recovery_timeout=60.0 # Test recovery after 1 minute
)
llm = create_llm("openai", model="gpt-4o-mini", retry_config=config)Observability hooks through events:
graph TD
A[LLM Operation] --> B[Event Emission]
B --> C[Global Event Bus]
C --> D[Event Listeners]
D --> E[Monitoring]
D --> F[Logging]
D --> G[Cost Tracking]
D --> H[Tool Control]
D --> I[Custom Logic]
E --> J[Metrics Dashboard]
F --> K[Log Files]
G --> L[Cost Alerts]
H --> M[Security Gates]
I --> N[Business Logic]
style B fill:#9c27b0
style C fill:#673ab7
style H fill:#f44336
from abstractcore.events import EventType, on_global
# Cost monitoring (best-effort estimate; based on token usage)
def monitor_costs(event):
if event.type != EventType.GENERATION_COMPLETED:
return
cost = event.data.get("cost_usd")
if isinstance(cost, (int, float)) and cost > 0.10:
alert(f"High estimated cost: ${cost:.2f}")
# Tool monitoring
def log_tools(event):
if event.type == EventType.TOOL_COMPLETED:
log(f"Tool completed: {event.data.get('tool_name')}")
# Performance tracking
def track_performance(event):
if event.type != EventType.GENERATION_COMPLETED:
return
duration_ms = event.data.get("duration_ms")
if isinstance(duration_ms, (int, float)) and duration_ms > 10_000:
log(f"Slow request: {float(duration_ms):.0f}ms")
on_global(EventType.GENERATION_COMPLETED, monitor_costs)
on_global(EventType.TOOL_COMPLETED, log_tools)
on_global(EventType.GENERATION_COMPLETED, track_performance)Type-safe responses with automatic validation, retry, and unified streaming:
graph TD
A[LLM Generate] --> B{Streaming Mode?}
B -->|Yes| C[Unified Streaming Processor]
B -->|No| D[Standard JSON Parsing]
C --> E[Incremental Tool Detector]
E --> F[Real-time Chunk Processing]
F --> G[Tool Call Detection]
G --> H[Mid-Stream Tool Execution]
D --> I[Parse JSON]
I --> J{Valid JSON?}
J -->|No| K[Retry with Error Feedback]
J -->|Yes| L[Pydantic Validation]
L --> M{Valid Model?}
M -->|No| K
M -->|Yes| N[Return Typed Object]
K --> O{Max Retries?}
O -->|No| A
O -->|Yes| P[Raise ValidationError]
style C fill:#4caf50
style E fill:#2196f3
style F fill:#ff9800
style G fill:#9c27b0
style K fill:#f44336
AbstractCore’s streaming system provides character-by-character streaming with incremental tool detection and optional tool-call syntax rewriting.
Architecture Components:
graph TD
A[Stream Input] --> B[UnifiedStreamProcessor]
B --> C[IncrementalToolDetector]
C --> D[Tag Rewriter]
D --> E[Tool Execution (optional)]
E --> F[Stream Output]
B --> G[Character-by-Character Handling]
G --> H[Intelligent Buffering]
H --> C
style B fill:#4caf50
style C fill:#2196f3
style D fill:#ff9800
style E fill:#9c27b0
Key Features:
-
Unified Streaming Strategy
- Single consistent approach across all providers
- Best-effort time-to-first-token (TTFT) telemetry for debugging
- Minimal buffering (incremental parsing)
-
Incremental Tool Detection
- Real-time tool call detection during streaming
- Emits
chunk.tool_callsas soon as a full tool call is detected - Handles partial tool calls across chunk boundaries
-
Character-by-Character Streaming
- Handles micro-chunking from providers (very small deltas)
- Intelligent buffering for partial tool calls
- Robust parsing with auto-repair for malformed JSON
-
Tool Call Tag Rewriting Integration
- Real-time format conversion during streaming
- Support for multiple formats (Qwen3, LLaMA3, Gemma, XML, custom)
- Designed to avoid large buffering while keeping tool calls structured
Streaming with Tag Rewriting Example:
from abstractcore import create_llm, tool
@tool
def analyze_code(code: str) -> str:
"""Return a small, deterministic analysis."""
return f"chars={len(code)}"
llm = create_llm("ollama", model="qwen3:4b-instruct") # requires Ollama running (default: http://localhost:11434)
for chunk in llm.generate(
"Write a Python function, then call analyze_code on it.",
stream=True,
tools=[analyze_code],
tool_call_tags="llama3", # Emit <function_call>...</function_call> style tags
):
print(chunk.content or "", end="", flush=True)
if chunk.tool_calls:
print(f"\nTool calls: {chunk.tool_calls}")
# Output format: <function_call>{"name": "analyze_code"}...</function_call>Implementation pointers (source of truth):
- Unified streaming + tool detection:
abstractcore/providers/streaming.py - Streaming wrapper + TTFT metadata:
abstractcore/providers/base.py
When validation fails, AbstractCore provides detailed feedback to the LLM:
# If LLM returns invalid data, AbstractCore automatically retries with:
"""
IMPORTANT: Your previous response had validation errors:
• Field 'age': Age must be positive (got -25)
• Field 'email': Invalid email format
Please correct these errors and provide valid JSON.
"""Simple conversation memory without complexity:
graph LR
A[BasicSession] --> B[Message History]
A --> C[System Prompt]
A --> D[Provider Reference]
B --> E[generate()]
C --> E
D --> E
E --> F[Add to History]
F --> G[Return Response]
A --> H[save()/load()]
H --> I[JSON Persistence]
style A fill:#2196f3
style B fill:#4caf50
The AbstractCore server provides OpenAI-compatible HTTP endpoints built on top of the core library:
graph TD
A[HTTP Client] --> B[FastAPI Server]
B --> C{Endpoint Router}
C --> D[/v1/chat/completions]
C --> E[/v1/embeddings]
C --> F[/v1/models]
C --> G[/providers]
C --> Img[/v1/images/* (optional)]
C --> Aud[/v1/audio/* (optional)]
C --> Cache[/acore/prompt_cache/*]
D --> H[Request Validation]
E --> H
F --> I[Provider Discovery]
G --> I
H --> J[AbstractCore Library]
I --> J
J --> K[Provider Interface]
K --> L[LLM Providers]
style B fill:#4caf50
style J fill:#e1f5fe
style K fill:#f3e5f5
Architecture Layers:
- HTTP Layer: FastAPI-based REST API with request validation
- Translation Layer: Converts HTTP requests to AbstractCore library calls
- Core Layer: Uses the full AbstractCore provider system
- Response Layer: Transforms responses to OpenAI-compatible format
Key Capabilities:
- OpenAI Compatibility: Drop-in replacement for OpenAI API clients
- Universal Provider Access: Single API for all providers (OpenAI, Anthropic, Ollama, etc.)
- Format Conversion: Automatic tool call format conversion for agentic CLIs
- Streaming Support: Server-sent events for real-time responses
- Model Discovery: Dynamic model listing across all providers
- Embedding Support: Multi-provider embedding generation (remote OpenAI-compatible providers plus local backends)
- Optional Vision Endpoints: OpenAI-compatible
/v1/images/generationsand/v1/images/editscan proxy to an upstream image server without local vision runtimes;/v1/vision/*local model control remains delegated toabstractvisionwhen installed and configured. Deep provider catalog discovery is exposed separately at/v1/vision/providers/and/v1/vision/models. - Optional Audio Endpoints: OpenAI-compatible
/v1/audio/transcriptionsand/v1/audio/speechdelegated to capability plugins (typicallyabstractvoice). Voice/profile and TTS model discovery are exposed at/v1/audio/voicesand/v1/audio/speech/models. - Prompt Cache Control Plane:
/acore/prompt_cache/*proxy endpoints for cache stats/set/update/fork/clear (best-effort; typically targets anabstractcore.endpointupstream).
Request Flow Example:
sequenceDiagram
participant Client
participant Server as FastAPI Server
participant Core as AbstractCore
participant Provider as LLM Provider
Client->>Server: POST /v1/chat/completions
Server->>Server: Validate Request
Server->>Core: create_llm(provider, model)
Server->>Core: llm.generate(messages, tools)
Core->>Provider: API call with retry logic
Provider->>Core: Response
Core->>Core: Execute tools if needed
Core->>Server: GenerateResponse
Server->>Server: Convert to OpenAI format
Server->>Client: HTTP Response (streaming or complete)
Server Features:
- Automatic Retry: Built-in retry logic from core library
- Event System: Full observability through events
- Debug Logging: Comprehensive request/response logging
- Health Checks:
/healthendpoint for monitoring - Interactive Docs: Auto-generated Swagger UI at
/docs - Multi-Worker Support: Production deployment with multiple workers
- Same code works everywhere: Switch providers by changing one line
- No vendor lock-in: Easy migration between cloud and local providers
- Consistent semantics: tools, streaming, and structured output follow the same API surface (provider/model differences still apply)
- Automatic reliability: Built-in retry logic and circuit breakers
- Comprehensive observability: Events for every operation
- Error handling: Proper error classification and handling
- Event system: Hook into any operation
- Tool system: Add new tools easily
- Provider system: Add new providers with minimal code
- Lazy loading: Providers loaded only when needed
- Connection pooling: Reuse HTTP connections
- Efficient parsing: Optimized JSON and tool parsing
AbstractCore is designed to be extended:
from abstractcore.providers.base import BaseProvider
class MyProvider(BaseProvider):
def generate(self, prompt: str, **kwargs) -> GenerateResponse:
# Implement provider-specific logic
return GenerateResponse(content="...")
def get_capabilities(self) -> List[str]:
return ["text_generation", "streaming"]from abstractcore import tool
@tool
def my_custom_tool(param: str) -> str:
"""Custom tool that does something useful."""
return f"Processed: {param}"AbstractCore’s overhead is usually small compared to model inference and network latency. If performance matters, benchmark on your target provider/model/hardware.
Common levers:
- Provider choice and base URL latency
- Concurrency (async + connection pooling)
- Streaming vs non-streaming
- Structured output (schema size, retry behavior)
- Tool execution strategy (pass-through vs host execution)
- Local execution (optional): tool execution is local (never executed by the provider); by default tool calls are returned for your host/runtime to execute
- Event prevention: Stop dangerous tools before execution
- Input validation: Validate tool parameters
- Environment variables: Secure key storage
- Avoid logging: treat logs as sensitive; do not log secrets (AbstractCore tries to avoid printing keys in logs)
- Provider isolation: Keys scoped to specific providers
- Local options: Support for local providers (Ollama, MLX)
- No persistent storage by default: conversation state lives in memory (for example
BasicSession) unless you explicitly save it or enable tracing/logging - Transparent processing: All operations are observable through events
The repo uses a mix of unit tests and integration tests. Some tests are provider-/network-/hardware-dependent and are opt-in.
Quick pointers:
- Run:
pytest -q - Vision tests:
tests/README_VISION_TESTING.md - Seed tests:
tests/README_SEED_TESTING.md - Streaming/tool parsing tests:
tests/streaming/andtests/test_agentic_cli_compatibility.py - Server/endpoint tests:
tests/server/andtests/test_abstractendpoint_singleton_provider.py
AbstractCore is a core package in the AbstractFramework ecosystem:
- AbstractFramework (umbrella): https://github.com/lpalbou/AbstractFramework
- AbstractCore (this repo): https://github.com/lpalbou/AbstractCore
- AbstractRuntime: https://github.com/lpalbou/abstractruntime
In this ecosystem, AbstractCore focuses on LLM I/O + provider abstraction, while AbstractRuntime focuses on durable execution (effects/tools/workflows/state). AbstractCore remains usable standalone; when you need durability/policy/sandboxing around tools, plug it into a runtime (for example AbstractRuntime).
graph TD
subgraph "UI Layer (peers)"
A[AbstractCode<br/>Terminal CLI]
B[AbstractFlow Visual Editor<br/>React + ReactFlow]
end
A -.->|optional| F[AbstractFlow Engine]
B --> F
F --> C[AbstractAgent]
A --> C
C --> D[AbstractRuntime]
D --> E[AbstractCore]
E --> G[LLM Providers]
style E fill:#e1f5fe
style A fill:#fff3e0
style B fill:#fff3e0
style F fill:#f3e5f5
style C fill:#f3e5f5
style D fill:#f3e5f5
- UI Layer (peers):
- AbstractCode: Terminal CLI for interactive sessions
- AbstractFlow Visual Editor: Web-based diagram editor (React + ReactFlow + FastAPI)
- AbstractFlow: Multi-agent orchestration engine + visual editor
- AbstractAgent: Agent patterns (ReactAgent, CodeActAgent) with durable execution
- AbstractRuntime: Effect system, workflows, state persistence
AbstractCode can optionally use AbstractFlow for running flows. AbstractFlow includes its own visual editor for designing workflows.
AbstractCore's architecture prioritizes:
- Reliability - Production-grade error handling and retry logic
- Simplicity - Clean APIs that are easy to understand and use
- Universality - Same interface and features across all providers
- Extensibility - Clear extension points for advanced features
- Observability - Comprehensive events for monitoring and control
- Flexibility - Deploy as Python library or OpenAI-compatible HTTP server
The result is a foundation that works reliably in production while remaining simple enough to learn quickly and flexible enough to build advanced applications on top of.