Add model load/unload/list loaded command routing#839
Conversation
… running in that mode. Long running process hosts any inferencing sessions in that case (e.g. CLI usage).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a ModelCommandRouter façade in the C++ SDK (sdk_v2/cpp) that centralizes the local-vs-external decision for the three model-management commands (list-loaded / load / unload). When Configuration::external_service_url is set, these commands are redirected over HTTP to a remote Foundry Local service (supporting the "long-running host process" / CLI scenario); otherwise they delegate to the in-process ModelLoadManager. Model and BaseModelCatalog swap their ModelLoadManager binding for the router, keeping them mode-agnostic.
Changes:
- New
ModelCommandRouter(model_command_router.{h,cc}) plus a sharedUrlEncodehelper hoisted intoutils.{h,cc};BaseModelCatalog::GetLoadedModels()rewritten as a single batch call;ModelLoadManager::GetLoadedModelIds()added. Managerconstructs/owns the router (with documented declaration/destruction ordering) and threads it through the catalog andModel::FromModelInfo.- Adds an explicit
INVALID_USAGEguard for local session creation in external mode (c_api.cc), a new router unit/integration test, an architecture plan doc, and updated test call sites.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/model_command_router.{h,cc} |
New routing façade; external Load/Unload target alias-keyed endpoints with model_id (bug) |
src/utils.{h,cc} |
Hoist UrlEncode to a shared helper (std::string_view) |
src/download/model_registry_client.cc |
Remove local UrlEncode, use shared one |
src/model.{h,cc} |
Swap ModelLoadManager* → ModelCommandRouter*; Load/Unload/IsLoaded delegate to router |
src/catalog/base_model_catalog.{h,cc} |
Take router in ctor; batch GetLoadedModels() via ListLoadedModelIds() |
src/catalog/azure_model_catalog.{h,cc} |
Forward router to base ctor |
src/manager.{h,cc} |
Own/construct router; wiring + teardown ordering |
src/inferencing/model_load_manager.{h,cc} |
Add GetLoadedModelIds() (locked snapshot) |
src/c_api.cc |
Guard local session creation in external mode |
docs/ModelCommandRoutingPlan.md |
Design/plan document |
CMakeLists.txt (cpp & test) |
Register new source and test |
test/internal_api/*, test/sdk_api/cache_only_test.cc |
Update call sites; new router tests; session-guard test |
Key finding: The web service load/unload endpoints (/models/load/{name}, /models/unload/{name}) resolve {name} by alias (BaseModelCatalog::GetModel), but Model::Load/Model::Unload send info_.model_id through the router, while /models/loaded returns model_ids. For any real model (where model_id = "<name>:<version>" ≠ alias), external-mode Load/Unload will 404 and throw. The new round-trip test masks this by calling the router directly with an alias instead of exercising Model::Load.
Add comprehensive integration test (and fix issue with '/shutdown' endpoint).
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Add redirection of model load/unload/list loaded to external url when running in that mode.
Long running process hosts any inferencing sessions in that case (e.g. CLI usage).