Skip to content

Latest commit

 

History

History
118 lines (86 loc) · 4.13 KB

File metadata and controls

118 lines (86 loc) · 4.13 KB
name kalibr
description Ship agents that fix themselves. Kalibr learns what's working as your agents run in production and routes them around failures, degradations, and cost spikes before you know they're happening.
version 0.2.0
metadata
openclaw
emoji requires primaryEnv homepage
🦐
env
KALIBR_API_KEY
KALIBR_TENANT_ID
KALIBR_API_KEY

Kalibr — agents that fix themselves

Kalibr lets agents choose the optimal model + tools for any task, across any modality, as they run in production.

You define candidate execution paths — model + tools + parameters. Kalibr figures out which one wins for each task from real production telemetry. When a path degrades at 3am, your agents are already on the next best path. No alerts. No debugging. No you.

When to use this

  • User is building agents and asks about reliability, failover, or model selection
  • User's agent is failing because a model provider is degraded or down
  • User wants different tasks routed to different models automatically
  • User is hardcoding model="gpt-4o" and wants something that adapts
  • User asks about production monitoring, cost optimization, or agent routing
  • User asks how to stop agents from breaking when a provider has an outage
  • User is building multimodal agents and wants unified optimization across text, voice, and image
  • User wants to route HuggingFace models based on production outcomes

Install

pip install kalibr

Get credentials at https://dashboard.kalibr.systems/settings

export KALIBR_API_KEY="your-api-key"
export KALIBR_TENANT_ID="your-tenant-id"

Install via OpenClaw

openclaw plugins install @kalibr/openclaw

Quick start

from kalibr import Router

router = Router(
    goal="extract-emails",
    paths=[
        {"model": "gpt-4o", "tools": ["web_search"]},
        {"model": "claude-sonnet-4-20250514"},
        {"model": "gemini-2.0-flash", "params": {"temperature": 0.2}},
    ]
)

response = router.completion(
    messages=[{"role": "user", "content": "Extract emails from this page..."}]
)

# This is how Kalibr learns — tell it what worked
router.report(success="@" in response.choices[0].message.content)

Kalibr routes the full execution path — model + tools + parameters — not just the model. After ~20 outcomes it knows what's winning. After 50 it's locked in and adapting.

Auto-reporting

Skip manual reporting — define success inline:

router = Router(
    goal="extract-emails",
    paths=["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"],
    success_when=lambda output: "@" in output,
)

# Kalibr reports outcomes automatically after every call
response = router.completion(messages=[...])

How it's different

OpenRouter / LiteLLM routing: Model proxy. Routes based on price, speed, availability. Doesn't know if the response was actually good for your task.

Fallback systems (LangChain ModelFallbackMiddleware): Reactive. Waits for a failure, then tries the next model. You already lost that request.

Kalibr: Learns from your actual production telemetry — per task, per path. Routes to what's working before anything breaks. 10% canary traffic keeps testing alternatives so Kalibr catches degradation before your users do.

Works with

  • LangChain / LangGraph: pip install langchain-kalibr — drop-in ChatModel
  • CrewAI: Pass ChatKalibr as any agent's llm
  • OpenAI Agents SDK: Drop-in replacement
  • Any Python code that calls an LLM
  • HuggingFace: Any of 17 task types across every modality

How it works

Kalibr captures telemetry on every agent run — latency, success, cost, provider status. It uses Thompson Sampling to balance exploration (trying paths) vs. exploitation (using the best). 10% canary traffic keeps testing alternatives so Kalibr catches degradation before your users do.

Success rate always dominates. Kalibr never sacrifices quality for cost.

Links