COE-370: Implement sessionless LiteLLM key lifecycle commands by kumanday · Pull Request #45 · trilogy-group/StackPerf

kumanday · 2026-04-22T03:09:08Z

Summary

Implements sessionless LiteLLM key creation, listing, revocation, and info commands without requiring a benchmark session.

In Scope

Add ProxyKeyService for sessionless key lifecycle management
Add benchmark key create/list/info/revoke CLI subcommands
Persist only non-secret key metadata locally
Display key secret once at creation; never persist or re-display
Render OpenAI-compatible and harness-profile env snippets
Unify credential_service.py with full LiteLLM-backed implementation

Tests

24 proxy key service tests (respx/httpx mocked LiteLLM API)
17 CLI key command tests
All 584 unit tests passing
Lint clean

Security

Key secrets returned as SecretStr and displayed once
No secret persisted in local database
HTTPS validation in production environments

- Add ProxyKeyService for sessionless key creation, listing, info, revocation - Add benchmark key create/list/info/revoke CLI subcommands - Wire key commands into Typer CLI app - Persist only non-secret metadata (alias, owner, team, customer, models, status) - Display key secret once at creation; never persist or re-display - Render OpenAI-compatible and harness-profile env snippets (shell + dotenv) - Unify credential_service.py with full LiteLLM-backed implementation from services_abc - Add respx/httpx unit tests for LiteLLM success and failure paths - Add CLI command tests with synthetic metadata - Add budget_currency field to UsagePolicyProfile - All 584 unit tests passing; lint clean

github-actions

🟡 Acceptable — The core feature (sessionless key lifecycle) is sound, but there are significant DRY violations and a dead-code problem that will bite maintainers.

Summary of blocking issues:

CredentialService and ProxyKeyService duplicate ~80 lines of identical code (render_env_shell, render_env_dotenv, HTTPS validation, LiteLLM API call patterns)
Six _legacy_* methods in CredentialService are dead code — defined but never called
services_abc.py still contains the old CredentialService class, creating a stale duplicate
Missing Evidence section in PR description

Non-blocking but important:

Silent error swallowing in revoke (both services) creates state inconsistency
render_env_shell sorts keys in ProxyKeyService but not CredentialService — subtle behavioral divergence in duplicated code

github-actions · 2026-04-22T03:15:50Z

+            harness_profile=harness_profile,
+            metadata_tags=metadata_tags,
+            litellm_key_id=litellm_key_id,
+            expires_at=expires_at,
+            is_active=True,
+        )

-        The credential encodes session metadata for correlation:
-        - session_id for direct session lookup
-        - experiment_id prefix for grouping
-        - variant_id prefix for variant tracking
-        - harness_profile for profile verification
+        return credential
+
+    async def revoke_credential(self, credential: ProxyCredential) -> ProxyCredential:
+        """Revoke an active credential via LiteLLM API.
+
+        Marks the credential as revoked and records revocation time.
+        Also calls LiteLLM API to delete the key if litellm_key_id is present.

        Args:
-            session_id: The benchmark session ID.
-            experiment_id: The experiment identifier.
-            variant_id: The variant identifier.
-            harness_profile: The harness profile name.
+            credential: Credential to revoke

        Returns:
-            A session-scoped proxy credential string.
+            Updated credential with is_active=False
+


🟡 Suggestion: Silent error swallowing creates state inconsistency — When LiteLLM delete fails, the local DB is marked revoked but the proxy key remains active. This means a 'revoked' benchmark key can still route traffic, which breaks the security model.

At minimum, log the failure at WARNING level so operators can investigate. Consider returning a revocation_status enum (e.g., FULL, LOCAL_ONLY, PROXY_UNREACHABLE) so the caller can decide what to do.

Fixed in bbe8b65 and d5682e4 - see workpad for per-item details.

github-actions · 2026-04-22T03:15:50Z

+            ProxyKey model if found, None otherwise.
+        """
+        orm = await self._repository.get_by_id(proxy_key_id)
+        if orm is None:
+            return None
+        return self._orm_to_model(orm)


🟡 Suggestion: render_env_shell and render_env_dotenv sort keys — But the identical methods in CredentialService do NOT sort keys. This is a subtle behavioral divergence in duplicated code. If the duplication is resolved (see comment on credential_service.py), pick one behavior consistently.

Fixed in bbe8b65 and d5682e4 - see workpad for per-item details.

… format src tests\n- add type annotation to _run(coro: Any) -> Any Co-authored-by: openhands <openhands@all-hands.dev>

github-actions

🟡 Acceptable with fixes — The sessionless key lifecycle feature is solid, but there's a correctness bug in budget handling, unresolved DRY/dead-code issues from the previous review, and the PR description lacks runtime evidence.

New issue found: budget_amount=0.0 silently falls through to policy default via or operator — this is a real bug since zero is a valid budget value meaning "no spending allowed".

Still unresolved from previous review:

DRY violations between CredentialService/ProxyKeyService (HTTPS validation, render_env_shell/dotenv)
Six dead _legacy_* methods in CredentialService (~65 lines never called)
Stale duplicate CredentialService in services_abc.py (not in this diff but still present in codebase)
Silent revocation error swallowing creates state inconsistency
Breaking issue_credential return type change (str → ProxyCredential)

Missing: PR description has no Evidence section showing end-to-end runtime output. Unit test counts alone are not sufficient proof that the CLI commands actually work against a real LiteLLM proxy.

github-actions · 2026-04-22T03:24:35Z

+        effective_budget_duration = budget_duration or (
+            usage_policy.budget_duration if usage_policy else None
+        )
+        effective_budget_amount = budget_amount or (


🟠 Important: or operator silently swallows budget_amount=0.0 — A zero budget is a valid value ("no spending allowed"), but 0.0 or (usage_policy.budget_amount ...) evaluates 0.0 as falsy and falls through to the policy default. Use is not None instead:

Suggested change

effective_budget_amount = budget_amount or (

effective_budget_amount = (

budget_amount

if budget_amount is not None

else (usage_policy.budget_amount if usage_policy else None)

)

github-actions · 2026-04-22T03:24:35Z

+        effective_owner = owner or (usage_policy.owner if usage_policy else None)
+        effective_team = team or (usage_policy.team if usage_policy else None)
+        effective_customer = customer or (usage_policy.customer if usage_policy else None)
+        effective_models = allowed_models or (


🟡 Suggestion: Same or issue with allowed_models=[] — If a caller explicitly passes allowed_models=[] (meaning "no models allowed"), the or operator treats the empty list as falsy and falls through to the policy default. Consider using is not None for symmetry with the budget_amount fix. Less critical since an empty allowed-models list is unusual, but the pattern is inconsistent.

github-actions · 2026-04-22T03:24:35Z

+        # Validate HTTPS in production environments
+        if enforce_https and not self.litellm_base_url.startswith(
+            ("https://", "http://localhost", "http://127.0.0.1")
+        ):
+            raise ValueError(
+                f"LiteLLM URL must use HTTPS in production environments. Got: {litellm_base_url}"
+            )


🟠 Important: Still unresolved — DRY violation — This HTTPS validation block is copy-pasted identically in ProxyKeyService.__init__ (lines 63-67). The render_env_shell and render_env_dotenv methods are also duplicated (~40 lines each) with a subtle behavioral divergence: ProxyKeyService sorts keys, CredentialService doesn't. Extract shared utilities.

Already fixed in bbe8b65: HTTPS validation extracted to validate_litellm_url() in benchmark_core/services/common.py. Both CredentialService and ProxyKeyService delegate to this shared utility.

github-actions · 2026-04-22T03:24:35Z

+            return None
+        return self._orm_to_model(orm)
+
+    async def revoke_key(self, proxy_key_id: UUID) -> ProxyKey | None:


🟡 Suggestion: Still unresolved — silent error swallowing on revocation — When LiteLLM delete fails, the local DB is marked revoked but the proxy key may remain active. This breaks the security model per the project's own architecture rules ("Any change that weakens correlation, reproducibility, or redaction is a design bug"). At minimum, log the failure at WARNING level so operators can detect stale keys. Consider returning a partial_revocation: bool indicator.

Already fixed in bbe8b65: revoke_key now calls warn_revoke_failure() which emits warnings.warn() for both HTTPStatusError and RequestError. Verified at lines 347, 349 of proxy_key_service.py.

- Extract shared validate_litellm_url(), render_env_shell(), render_env_dotenv(), warn_revoke_failure() utilities into benchmark_core/services/common.py - Remove dead _legacy_* methods from CredentialService (~65 lines) - Replace stale CredentialService class in services_abc.py with re-export - Add warnings.warn on LiteLLM revoke failures in both services - Ensure render_env_* methods sort keys deterministically (both services) - Fix all ruff/mypy issues, 584 tests passing Co-authored-by: openhands <openhands@all-hands.dev>

…ed_models, budget_duration Replaces or-operator fallback with is not None checks so falsy-but-valid values (0, [], empty string) are not silently overridden by usage policy. AI review: inline comments 3121395728, 3121395737. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions

🟡 Acceptable with fixes — Previous review issues (DRY, dead code, or operator bugs, silent error swallowing) are resolved. Remaining: an integer-division TTL truncation bug, a semantically misleading LITELLM_KEY_ALIAS assignment, an unused parameter, inconsistent revoke UX, and missing runtime evidence.

github-actions · 2026-04-22T03:37:42Z

+            else (usage_policy.budget_amount if usage_policy else None)
+        )
+        effective_ttl = (
+            usage_policy.ttl_seconds // 3600


🟠 Important: Integer division truncates TTL silently — ttl_seconds // 3600 rounds down, so ttl_seconds=3599 yields 0 hours (key expires immediately) and ttl_seconds=7199 yields 1 hour instead of ~2. This is a real bug — operators configuring policies in seconds will get silently shorter key lifetimes than intended.

Consider rounding up or using timedelta(seconds=...) directly:

Suggested change

usage_policy.ttl_seconds // 3600

effective_ttl = (

-(-usage_policy.ttl_seconds // 3600) # ceiling division

if usage_policy and usage_policy.ttl_seconds

else ttl_hours

)

Or avoid the hours conversion entirely and pass max_budget_duration / compute expires_at from seconds directly.

github-actions · 2026-04-22T03:37:42Z

+        }
+        if model:
+            env_vars["OPENAI_MODEL"] = model
+        env_vars["LITELLM_KEY_ALIAS"] = harness_profile_name


🟠 Important: LITELLM_KEY_ALIAS set to harness profile name, not the key alias — This is semantically misleading. The env var name suggests it holds the key alias, but it's assigned harness_profile_name (e.g. "openai-compatible"). Meanwhile, CredentialService.render_env_snippet sets LITELLM_SESSION_ALIAS = credential.key_alias — different var name AND different value.

If this var is meant for correlation (joining LiteLLM logs back to the key), it needs the actual key alias. If it's meant to signal which harness profile to use, name it HARNESS_PROFILE or similar.

Suggested change

env_vars["LITELLM_KEY_ALIAS"] = harness_profile_name

env_vars["LITELLM_KEY_ALIAS"] = key_alias # pass key_alias as parameter

Or rename the variable to match its actual purpose.

github-actions · 2026-04-22T03:37:42Z

+    def render_env_snippet(
+        self,
+        api_key: SecretStr,
+        proxy_base_url: str = "http://localhost:4000",
+        model: str | None = None,
+        harness_profile: str = "openai-compatible",


🟡 Suggestion: harness_profile parameter is accepted but never used — The docstring says "Harness profile hint for rendering" but the method body ignores it entirely. This is dead API surface that will mislead callers into thinking the profile name affects output.

Either remove the parameter or implement profile-aware rendering (which the comment in render_harness_env says is planned).

github-actions · 2026-04-22T03:37:42Z

+"""Shared utilities for benchmark core services.
+
+Provides common helper functions used across multiple service classes
+to avoid duplication and ensure consistent behavior.
+"""
+
+import warnings
+
+
+def validate_litellm_url(litellm_base_url: str, enforce_https: bool = True) -> str:
+    """Validate and normalize a LiteLLM base URL.
+
+    Strips trailing slashes and optionally enforces HTTPS in
+    production environments.
+
+    Args:
+        litellm_base_url: The raw base URL for the LiteLLM proxy API.
+        enforce_https: Whether to require HTTPS. Defaults to True.
+
+    Returns:
+        The normalized URL with trailing slash removed.
+
+    Raises:
+        ValueError: If HTTPS is required but the URL is not localhost
+            and does not start with ``https://``.
+    """
+    normalized = litellm_base_url.rstrip("/")
+    if enforce_https and not normalized.startswith(
+        ("https://", "http://localhost", "http://127.0.0.1")
+    ):
+        raise ValueError(
+            f"LiteLLM URL must use HTTPS in production environments. Got: {litellm_base_url}"
+        )
+    return normalized
+
+
+def render_env_shell(env_vars: dict[str, str]) -> str:
+    """Render environment variables as shell export commands.
+
+    Variables are sorted alphabetically for deterministic output.
+
+    Args:
+        env_vars: Dictionary of environment variables.
+
+    Returns:
+        Shell export commands string.
+    """
+    lines: list[str] = []
+    for key in sorted(env_vars.keys()):
+        value = env_vars[key]
+        escaped = value.replace("'", "'\\''")
+        lines.append(f"export {key}='{escaped}'")
+    return "\n".join(lines)
+
+
+def render_env_dotenv(env_vars: dict[str, str]) -> str:
+    """Render environment variables as dotenv file content.
+
+    Variables are sorted alphabetically for deterministic output.
+
+    Args:
+        env_vars: Dictionary of environment variables.
+
+    Returns:
+        Dotenv file content string.
+    """
+    lines: list[str] = []
+    for key in sorted(env_vars.keys()):
+        value = env_vars[key]
+        if " " in value or "#" in value or "'" in value or '"' in value:
+            escaped = value.replace("\\", "\\\\").replace('"', '\\"')
+            lines.append(f'{key}="{escaped}"')
+        else:
+            lines.append(f"{key}={value}")
+    return "\n".join(lines)
+
+
+def warn_revoke_failure(source: str, litellm_key_id: str | None, exc: Exception) -> None:
+    """Warn when a LiteLLM key revocation attempt fails.
+
+    This helper centralises the warning issued when a best-effort LiteLLM
+    delete call fails, so both :class:`CredentialService` and
+    :class:`ProxyKeyService` behave consistently.
+
+    Args:
+        source: Human-readable source of the revocation (e.g. ``credential``
+            or ``proxy_key``).
+        litellm_key_id: The LiteLLM key ID that could not be deleted.
+        exc: The original exception that caused the failure.
+    """
+    warnings.warn(
+        f"LiteLLM delete failed for {source} (key_id={litellm_key_id}): {exc}. "
+        "Local metadata has been marked revoked but the proxy key may still be active.",
+        stacklevel=3,
+    )


🟡 Suggestion: common.py has zero direct test coverage — These are now the shared foundation for both CredentialService and ProxyKeyService, but they're only tested indirectly through service tests. A test_common.py with direct unit tests for validate_litellm_url (HTTPS enforcement, localhost exceptions, trailing slash stripping), render_env_shell/render_env_dotenv (sorting, escaping), and warn_revoke_failure would catch regressions in the shared code path that could be masked by mock boundaries in the service tests.

github-actions · 2026-04-22T03:37:42Z

+@app.command()
+def revoke(
+    key_id: str = typer.Argument(..., help="Proxy key ID (UUID)"),
+) -> None:
+    """Revoke a sessionless proxy key.
+
+    Marks the local metadata as inactive and attempts LiteLLM deletion.
+    The key secret cannot be recovered after revocation.
+    """
+    try:
+        pk_uuid = UUID(key_id)
+    except ValueError as e:
+        console.print(f"[red]Invalid UUID: {key_id}[/red]")
+        raise typer.Exit(1) from e


🟡 Suggestion: revoke rejects alias input but info accepts it — info (line 213-218) tries UUID first, then falls back to alias lookup. revoke immediately errors on non-UUID input. Operators who identify keys by alias (the human-readable name shown at creation) will hit this friction on the destructive operation where they most need convenience. Consider the same UUID-then-alias fallback pattern:

Suggested change

@app.command()

def revoke(

key_id: str = typer.Argument(..., help="Proxy key ID (UUID)"),

) -> None:

"""Revoke a sessionless proxy key.

Marks the local metadata as inactive and attempts LiteLLM deletion.

The key secret cannot be recovered after revocation.

"""

try:

pk_uuid = UUID(key_id)

except ValueError as e:

console.print(f"[red]Invalid UUID: {key_id}[/red]")

raise typer.Exit(1) from e

# Try as UUID first, then as alias (consistent with info command)

try:

pk_uuid = UUID(key_id)

except ValueError:

# Look up by alias to get the UUID

with get_db_session() as db:

service = _get_service(db)

found = _run(service.get_key_by_alias(key_id))

if found is None:

console.print(f"[red]Proxy key not found: {key_id}[/red]")

raise typer.Exit(1)

pk_uuid = found.proxy_key_id

- revoke command now accepts UUID or alias (same as info) - create command adds --budget-amount, --budget-duration, --meta options - update test to match new alias-based invalid-input behavior AI review feedback: inline comments 3121395742, 3121395744. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions

🟡 Acceptable with fixes — Three issues from previous reviews remain unfixed in the code: TTL truncation bug, LITELLM_KEY_ALIAS misassignment, and unused harness_profile parameter. Plus missing Evidence section in PR description.

github-actions · 2026-04-22T03:45:39Z

+        effective_ttl = (
+            usage_policy.ttl_seconds // 3600
+            if usage_policy and usage_policy.ttl_seconds
+            else ttl_hours


🟠 Important: Integer division truncates TTL silently — ttl_seconds // 3600 rounds down, so ttl_seconds=3599 yields 0 hours (key expires immediately). Use ceil division so operators get at least the configured duration. This was flagged in two previous reviews and remains unfixed.