Add support for local LLMs and fix a rendering bug#1
Add support for local LLMs and fix a rendering bug#1directionless wants to merge 2 commits into1Password:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds first-class support for running SCAM against local LLM endpoints (Ollama + vLLM) using a provider/model naming scheme, and fixes an HTML export rendering issue in the agentic dashboard summary.
Changes:
- Add local model parsing/routing (
ollama/…,vllm-openai/…,vllm-anthropic/…) and wire it intocreate_model()and provider resolution. - Add Ollama model discovery for interactive selection and expand docs/tests for local model usage.
- Fix unbalanced HTML in the single-run summary branch of the agentic dashboard export.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_agentic.py |
Adds tests for local model parsing/creation and Ollama discovery behavior. |
scam/utils/config.py |
Adds local-provider prefixes, default local base URLs, and local model parsing/provider resolution. |
scam/models/openai.py |
Allows OpenAI client to be configured with optional base_url/api_key for local endpoints. |
scam/models/anthropic.py |
Allows Anthropic client to be configured with optional base_url/api_key for local endpoints. |
scam/models/__init__.py |
Routes provider/model IDs to the correct model class + base URL defaults. |
scam/models/discovery.py |
Adds Ollama discovery (/api/tags) and integrates it into interactive model selection. |
scam/agentic/export_html.py |
Fixes missing </div> in single-run summary HTML generation. |
AGENTS.md |
Documents running benchmarks against local endpoints and venv-based test invocation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mock_resp.__enter__ = lambda self: self | ||
| mock_resp.__exit__ = lambda *a: None |
There was a problem hiding this comment.
The context-manager mocking here is incorrect: assigning __enter__ as lambda self: self will be called with no arguments by the with statement, causing a TypeError. Use mock_resp.__enter__.return_value = mock_resp (and similarly for __exit__) or use contextlib.nullcontext/MagicMock()’s built-in context-manager methods instead.
| mock_resp.__enter__ = lambda self: self | |
| mock_resp.__exit__ = lambda *a: None | |
| mock_resp.__enter__.return_value = mock_resp | |
| mock_resp.__exit__.return_value = None |
| model = create_model("ollama/llama3.2") | ||
| assert isinstance(model, OpenAIModel) | ||
| assert model.model_name == "llama3.2" | ||
| assert model.client._client.base_url.host == "localhost" | ||
| assert "11434" in str(model.client._client.base_url) |
There was a problem hiding this comment.
These assertions reach into the OpenAI/Anthropic SDK private attribute client._client, which is brittle and likely to break on dependency upgrades. Consider storing the configured base_url on the wrapper model (e.g., self.base_url) and asserting against that, or use a documented/public attribute if the SDK exposes one.
| Returns None if not a local provider format. | ||
| """ | ||
| for prefix in LOCAL_PROVIDER_PREFIXES: | ||
| if model_name.startswith(prefix): | ||
| return prefix.rstrip("/"), model_name[len(prefix) :].strip() or model_name |
There was a problem hiding this comment.
parse_local_model_id can return the original model_name (including the provider prefix) when the suffix is empty (e.g. "ollama/"). That will propagate into create_model as an invalid model id. Prefer returning None (or raising a ValueError) when no model id is provided after the prefix, instead of falling back to model_name.
| Returns None if not a local provider format. | |
| """ | |
| for prefix in LOCAL_PROVIDER_PREFIXES: | |
| if model_name.startswith(prefix): | |
| return prefix.rstrip("/"), model_name[len(prefix) :].strip() or model_name | |
| Returns None if not a local provider format or if no model id is provided after the prefix. | |
| """ | |
| for prefix in LOCAL_PROVIDER_PREFIXES: | |
| if model_name.startswith(prefix): | |
| model_id = model_name[len(prefix) :].strip() | |
| if not model_id: | |
| # No model id after the provider prefix (e.g. "ollama/"); treat as invalid. | |
| return None | |
| return prefix.rstrip("/"), model_id |
| if provider == "ollama": | ||
| base_url = os.environ.get("OLLAMA_BASE_URL", DEFAULT_OLLAMA_BASE_URL) | ||
| return OpenAIModel(model_id, base_url=base_url, api_key="ollama") | ||
| if provider == "vllm-openai": | ||
| base_url = os.environ.get("VLLM_OPENAI_BASE_URL", DEFAULT_VLLM_OPENAI_BASE_URL) | ||
| api_key = os.environ.get("OPENAI_API_KEY", "dummy") | ||
| return OpenAIModel(model_id, base_url=base_url, api_key=api_key) |
There was a problem hiding this comment.
OLLAMA_BASE_URL is treated as an OpenAI-compatible base URL here, but users will commonly set it to http://localhost:11434 (without /v1). In that case the OpenAI SDK will hit /chat/completions instead of /v1/chat/completions. Consider normalizing the env var (accept both with and without /v1) before passing it to OpenAIModel, or rename/document the env var to clearly require the /v1 suffix.
| base = os.environ.get("OLLAMA_BASE_URL", "http://localhost:11434").rstrip("/") | ||
| # Ollama tags endpoint is at /api/tags, not under /v1 | ||
| if base.endswith("/v1"): | ||
| base = base[: -3] | ||
| url = f"{base}/api/tags" |
There was a problem hiding this comment.
list_ollama_models defaults OLLAMA_BASE_URL to http://localhost:11434 and strips an optional /v1, but create_model uses the same env var as an OpenAI base URL (which typically must include /v1). This inconsistency can lead to a working model list but broken inference calls depending on how the env var is set. It’d be safer to standardize the semantics (e.g., always treat OLLAMA_BASE_URL as the host root and append /v1 only for OpenAI calls, or introduce separate env vars).
| # Single run, no reproducibility data — close lb-stats (opened above) | ||
| p.append(' </div>') |
There was a problem hiding this comment.
This fixes the unbalanced HTML for the single-run branch, but there’s no test exercising _generate_summary_html/dashboard generation. Since the project already has HTML export tests, consider adding a regression test that generates a dashboard with n_phase_runs == 1 and asserts the expected section contains balanced lb-stats markup (or at least that the output contains the closing </div>).
(Both of these changes were generated by Cursor.)
Add support for using local llms. A bit more directly than monkeying the API urls. Seems to work, at least, I ran SCAM against several local models on my m3 macbook.
Fix a small rendering bug in the exported reports.