Skip to content

fix(metering): default local-provider pricing to $0 for uncataloged models#1055

Closed
AL-ZiLLA wants to merge 2 commits into
RightNow-AI:mainfrom
AL-ZiLLA:fix/local-driver-zero-cost
Closed

fix(metering): default local-provider pricing to $0 for uncataloged models#1055
AL-ZiLLA wants to merge 2 commits into
RightNow-AI:mainfrom
AL-ZiLLA:fix/local-driver-zero-cost

Conversation

@AL-ZiLLA
Copy link
Copy Markdown
Contributor

Bug

estimate_cost_with_catalog falls back to (1.0, 3.0) per million tokens for any model not present in the builtin catalog. That unconditional fallback treats locally-served models — custom Ollama Modelfiles, vLLM variants, LM Studio / Lemonade / llama.cpp endpoints — as paid cloud models, even though they run on the user's own hardware and cost $0 per call.

On my deployment, two agents on a custom Ollama Modelfile (gemma4-agent:latest) were tripping the $2/hr and $8/day budget quotas with entirely fictional cost. Ledger shows 895 calls / $43.59 of phantom burn across two weeks — actual cost: $0.

Repro on main:

let catalog = ModelCatalog::new();
let cost = MeteringEngine::estimate_cost_with_catalog(
    &catalog,
    "my-custom-ollama-modelfile",
    1_000_000, 1_000_000,
);
assert_eq!(cost, 0.0); // FAILS — returns 4.0

Proof from my usage_events table:

model calls total_cost $/M tokens
gemma4:26b 390 $0.00 0.00 (catalog hit)
gemma4-agent 895 $43.59 1.01 (catalog miss → fallback)

Both are local Ollama models producing zero-cost inference. The only difference is that gemma4-agent is a user-built Modelfile alias and isn't in the builtin catalog.

Fix

Thread the provider string through estimate_cost_with_catalog and pick the fallback based on whether the provider runs inference locally:

  • Local providers (ollama, vllm, lmstudio, lm-studio, lemonade, llamacpp, llama.cpp, local) → fallback (0.0, 0.0)
  • Cloud providers → fallback unchanged at (1.0, 3.0) so an unknown cloud model surfaces a cost estimate rather than hiding it

Catalog pricing always wins if the model is registered — a known cloud model won't get silenced by a mislabeled provider hint.

Impact

  • Fixes false quota trips on custom Ollama Modelfiles and locally-served LLMs
  • Does not change pricing for any cataloged model
  • Does not change behavior for unknown cloud models (still $1/$3 fallback)

Callers updated

Three call sites in crates/openfang-kernel/src/kernel.rs now pass &manifest.model.provider. The manifest ModelConfig already carries this field (crates/openfang-types/src/agent.rs:375), so no storage or config changes are required.

Tests

Added three new tests and updated three existing ones in crates/openfang-kernel/src/metering.rs:

  • test_estimate_cost_with_catalog_unknown_local_is_free — every supported local provider string returns $0 for an unknown model; verifies case-insensitive matching
  • test_estimate_cost_with_catalog_known_model_ignores_provider_hint — catalog pricing wins over the provider hint
  • test_is_local_provider — unit test for the helper
  • Updated existing alias / catalog-hit / unknown-cloud tests to pass a provider argument

Full workspace cargo test --release passes (1,300+ tests, 0 failures). cargo clippy -p openfang-runtime -p openfang-kernel -p openfang-api -- -D warnings clean.

ALZiLLA and others added 2 commits April 14, 2026 11:36
1. CSP: x-frame-options SAMEORIGIN + frame-ancestors for localhost:3000
   (allows Command Center iframe embedding)
2. reasoning serde alias: accept both "reasoning" (Gemma 4 via Ollama)
   and "reasoning_content" (DeepSeek-R1, Qwen3) in non-streaming responses

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…odels

estimate_cost_with_catalog previously fell back to ($1/M input, $3/M
output) for any model not in the builtin catalog. Custom local
Modelfiles — e.g. an Ollama `gemma4-agent:latest` built via the
Ollama CLI — miss the catalog and so were charged as if they were
a paid cloud model, tripping budget quotas on zero-cost inference.

Fix: thread the provider string through estimate_cost_with_catalog
and pick the fallback based on whether the provider runs inference
locally. For ollama/vllm/lmstudio/lemonade/llamacpp/local, default
to ($0, $0). Cloud providers still default to ($1, $3) so an
unknown cloud model surfaces a cost estimate rather than hiding it.

Catalog pricing always wins if the model IS registered — a known
model tagged with a local provider hint still uses catalog prices.

Added unit tests covering: local-unknown is free, cloud-unknown
uses default, known model ignores the provider hint, case-insensitive
provider matching, and every supported local-provider string.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@jaberjaber23
Copy link
Copy Markdown
Member

Thanks @AL-ZiLLA — the metering fix is genuinely useful (local-GPU users running uncataloged Ollama/vLLM model IDs were getting phantom cloud pricing in budgets). But this PR bundles three unrelated concerns and can't land as one.

Ask: please split into 3 PRs

  1. fix(metering): default local-provider pricing to $0 — the core change in crates/openfang-kernel/src/metering.rs. Double-check is_local_provider against the full set of local identifiers (ollama, llamafile, vllm, lmstudio, localai, tabby, plus custom base_url-driven "openai-compatible" aliases). This is the one we want to land first.
  2. chore(api): iframe / CSP policy update — the X-Frame-Options: SAMEORIGIN + frame-ancestors changes in crates/openfang-api/src/middleware.rs. This is a security-relevant change (clickjacking posture) and needs its own security sign-off. Relaxing frame-ancestors to http://localhost:3000 is fine for local dev but we need to be explicit about the threat model if the dashboard is ever exposed on non-localhost.
  3. fix(openai): accept reasoning / reasoning_content aliases — the serde alias in drivers/openai.rs.

CI is currently red on this branch; after splitting and rebasing on post-#1041 main the metering PR should go green quickly.

Thanks again — looking forward to landing (1) fast.

Copy link
Copy Markdown
Member

@jaberjaber23 jaberjaber23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title says metering but the first diff hunk weakens X-Frame-Options from DENY to SAMEORIGIN and adds http://localhost:3000 to frame-ancestors. That is a clickjacking-relevant change hidden in a metering PR for a private Command Center deployment.

To merge:

  1. Split the metering pricing fix into its own PR (that part is fine on its own).
  2. The frame-ancestors change should be a separate, opt-in config field (e.g. dashboard.embed_origins in config.toml) defaulting to none. Otherwise every install ships a relaxed CSP for one operator's local proxy.

Holding here as request-changes for the security regression.

@jaberjaber23
Copy link
Copy Markdown
Member

Closing this in favor of a clean re-submission.

The security regression flagged on 2026-04-17 is still in the diff at d14f706:

  • x-frame-options downgraded from DENY to SAMEORIGIN
  • frame-ancestors relaxed from 'none' to 'self' http://localhost:3000

Both ship in crates/openfang-api/src/middleware.rs and weaken clickjacking posture for every install, not just yours. No author response in 4 weeks.

To get the metering fix landed, open a new PR with only the crates/openfang-kernel/src/metering.rs + kernel.rs changes (the is_local_provider $0/$0 fallback). That part is good and we want it.

The CSP / X-Frame-Options change needs its own PR and must be opt-in via config (e.g. dashboard.embed_origins in config.toml, default empty). The OpenAI reasoning serde alias should be a third PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants