Skip to content

Commit 0a50741

Browse files
authored
Merge branch 'main' into lcore-1247
2 parents 5614c15 + 2916319 commit 0a50741

40 files changed

+914
-510
lines changed

README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ The service includes comprehensive user data collection capabilities for various
7373
* [OpenAPI specification](#openapi-specification)
7474
* [Readiness Endpoint](#readiness-endpoint)
7575
* [Liveness Endpoint](#liveness-endpoint)
76+
* [Models endpoint](#models-endpoint)
7677
* [Database structure](#database-structure)
7778
* [Publish the service as Python package on PyPI](#publish-the-service-as-python-package-on-pypi)
7879
* [Generate distribution archives to be uploaded into Python registry](#generate-distribution-archives-to-be-uploaded-into-python-registry)
@@ -1045,6 +1046,62 @@ The liveness endpoint performs a basic health check to verify the service is ali
10451046
}
10461047
```
10471048

1049+
## Models endpoint
1050+
1051+
**Endpoint:** `GET /v1/models`
1052+
1053+
Process GET requests and returns a list of available models from the Llama
1054+
Stack service. It is possible to specify "model_type" query parameter that is
1055+
used as a filter. For example, if model type is set to "llm", only LLM models
1056+
will be returned:
1057+
1058+
```bash
1059+
curl http://localhost:8080/v1/models?model_type=llm
1060+
```
1061+
1062+
The "model_type" query parameter is optional. When not specified, all models
1063+
will be returned.
1064+
1065+
**Response Body:**
1066+
```json
1067+
{
1068+
"models": [
1069+
{
1070+
"identifier": "sentence-transformers/.llama",
1071+
"metadata": {
1072+
"embedding_dimension": 384
1073+
},
1074+
"api_model_type": "embedding",
1075+
"provider_id": "sentence-transformers",
1076+
"type": "model",
1077+
"provider_resource_id": ".llama",
1078+
"model_type": "embedding"
1079+
},
1080+
{
1081+
"identifier": "openai/gpt-4o-mini",
1082+
"metadata": {},
1083+
"api_model_type": "llm",
1084+
"provider_id": "openai",
1085+
"type": "model",
1086+
"provider_resource_id": "gpt-4o-mini",
1087+
"model_type": "llm"
1088+
},
1089+
{
1090+
"identifier": "sentence-transformers/nomic-ai/nomic-embed-text-v1.5",
1091+
"metadata": {
1092+
"embedding_dimension": 768
1093+
},
1094+
"api_model_type": "embedding",
1095+
"provider_id": "sentence-transformers",
1096+
"type": "model",
1097+
"provider_resource_id": "nomic-ai/nomic-embed-text-v1.5",
1098+
"model_type": "embedding"
1099+
}
1100+
]
1101+
}
1102+
```
1103+
1104+
10481105
# Database structure
10491106

10501107
Database structure is described on [this page](https://lightspeed-core.github.io/lightspeed-stack/DB/index.html)

docs/openapi.json

Lines changed: 56 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@
245245
"models"
246246
],
247247
"summary": "Models Endpoint Handler",
248-
"description": "Handle requests to the /models endpoint.\n\nProcess GET requests to the /models endpoint, returning a list of available\nmodels from the Llama Stack service.\n\nParameters:\n request: The incoming HTTP request.\n auth: Authentication tuple from the auth dependency.\n model_type: Optional filter to return only models matching this type.\n\nRaises:\n HTTPException: If unable to connect to the Llama Stack server or if\n model retrieval fails for any reason.\n\nReturns:\n ModelsResponse: An object containing the list of available models.",
248+
"description": "Handle requests to the /models endpoint.\n\nProcess GET requests to the /models endpoint, returning a list of available\nmodels from the Llama Stack service. It is possible to specify \"model_type\"\nquery parameter that is used as a filter. For example, if model type is set\nto \"llm\", only LLM models will be returned:\n\n curl http://localhost:8080/v1/models?model_type=llm\n\nThe \"model_type\" query parameter is optional. When not specified, all models\nwill be returned.\n\n### Parameters:\n request: The incoming HTTP request.\n auth: Authentication tuple from the auth dependency.\n model_type: Optional filter to return only models matching this type.\n\n### Raises:\n HTTPException: If unable to connect to the Llama Stack server or if\n model retrieval fails for any reason.\n\n### Returns:\n ModelsResponse: An object containing the list of available models.",
249249
"operationId": "models_endpoint_handler_v1_models_get",
250250
"parameters": [
251251
{
@@ -890,7 +890,7 @@
890890
"providers"
891891
],
892892
"summary": "Get Provider Endpoint Handler",
893-
"description": "Retrieve a single provider by its unique ID.\n\nReturns:\n ProviderResponse: Provider details.\n\nRaises:\n HTTPException:\n - 401: Authentication failed\n - 403: Authorization failed\n - 404: Provider not found\n - 500: Lightspeed Stack configuration not loaded\n - 503: Unable to connect to Llama Stack",
893+
"description": "Retrieve a single provider identified by its unique ID.\n\nReturns:\n ProviderResponse: Provider details.\n\nRaises:\n HTTPException:\n - 401: Authentication failed\n - 403: Authorization failed\n - 404: Provider not found\n - 500: Lightspeed Stack configuration not loaded\n - 503: Unable to connect to Llama Stack",
894894
"operationId": "get_provider_endpoint_handler_v1_providers__provider_id__get",
895895
"parameters": [
896896
{
@@ -1170,7 +1170,7 @@
11701170
"rags"
11711171
],
11721172
"summary": "Get Rag Endpoint Handler",
1173-
"description": "Retrieve a single RAG by its unique ID.\n\nAccepts both user-facing rag_id (from LCORE config) and llama-stack\nvector_store_id. If a rag_id from config is provided, it is resolved\nto the underlying vector_store_id for the llama-stack lookup.\n\nReturns:\n RAGInfoResponse: A single RAG's details.\n\nRaises:\n HTTPException:\n - 401: Authentication failed\n - 403: Authorization failed\n - 404: RAG with the given ID not found\n - 500: Lightspeed Stack configuration not loaded\n - 503: Unable to connect to Llama Stack",
1173+
"description": "Retrieve a single RAG identified by its unique ID.\n\nAccepts both user-facing rag_id (from LCORE config) and llama-stack\nvector_store_id. If a rag_id from config is provided, it is resolved\nto the underlying vector_store_id for the llama-stack lookup.\n\nReturns:\n RAGInfoResponse: A single RAG's details.\n\nRaises:\n HTTPException:\n - 401: Authentication failed\n - 403: Authorization failed\n - 404: RAG with the given ID not found\n - 500: Lightspeed Stack configuration not loaded\n - 503: Unable to connect to Llama Stack",
11741174
"operationId": "get_rag_endpoint_handler_v1_rags__rag_id__get",
11751175
"parameters": [
11761176
{
@@ -2489,7 +2489,7 @@
24892489
"conversations_v1"
24902490
],
24912491
"summary": "Conversation Get Endpoint Handler V1",
2492-
"description": "Handle request to retrieve a conversation by ID using Conversations API.\n\nRetrieve a conversation's chat history by its ID using the LlamaStack\nConversations API. This endpoint fetches the conversation items from\nthe backend, simplifies them to essential chat history, and returns\nthem in a structured response. Raises HTTP 400 for invalid IDs, 404\nif not found, 503 if the backend is unavailable, and 500 for\nunexpected errors.\n\nArgs:\n request: The FastAPI request object\n conversation_id: Unique identifier of the conversation to retrieve\n auth: Authentication tuple from dependency\n\nReturns:\n ConversationResponse: Structured response containing the conversation\n ID and simplified chat history",
2492+
"description": "Handle request to retrieve a conversation identified by ID using Conversations API.\n\nRetrieve a conversation's chat history by its ID using the LlamaStack\nConversations API. This endpoint fetches the conversation items from\nthe backend, simplifies them to essential chat history, and returns\nthem in a structured response. Raises HTTP 400 for invalid IDs, 404\nif not found, 503 if the backend is unavailable, and 500 for\nunexpected errors.\n\nArgs:\n request: The FastAPI request object\n conversation_id: Unique identifier of the conversation to retrieve\n auth: Authentication tuple from dependency\n\nReturns:\n ConversationResponse: Structured response containing the conversation\n ID and simplified chat history",
24932493
"operationId": "get_conversation_endpoint_handler_v1_conversations__conversation_id__get",
24942494
"parameters": [
24952495
{
@@ -3179,7 +3179,7 @@
31793179
"conversations_v2"
31803180
],
31813181
"summary": "Get Conversation Endpoint Handler",
3182-
"description": "Handle request to retrieve a conversation by ID.",
3182+
"description": "Handle request to retrieve a conversation identified by its ID.",
31833183
"operationId": "get_conversation_endpoint_handler_v2_conversations__conversation_id__get",
31843184
"parameters": [
31853185
{
@@ -3763,6 +3763,26 @@
37633763
}
37643764
}
37653765
},
3766+
"413": {
3767+
"description": "Prompt is too long",
3768+
"content": {
3769+
"application/json": {
3770+
"schema": {
3771+
"$ref": "#/components/schemas/PromptTooLongResponse"
3772+
},
3773+
"examples": {
3774+
"prompt too long": {
3775+
"value": {
3776+
"detail": {
3777+
"cause": "The prompt exceeds the maximum allowed length.",
3778+
"response": "Prompt is too long"
3779+
}
3780+
}
3781+
}
3782+
}
3783+
}
3784+
}
3785+
},
37663786
"422": {
37673787
"description": "Request validation failed",
37683788
"content": {
@@ -4312,7 +4332,7 @@
43124332
],
43134333
"summary": "Handle A2A Jsonrpc",
43144334
"description": "Handle A2A JSON-RPC requests following the A2A protocol specification.\n\nThis endpoint uses the DefaultRequestHandler from the A2A SDK to handle\nall JSON-RPC requests including message/send, message/stream, etc.\n\nThe A2A SDK application is created per-request to include authentication\ncontext while still leveraging FastAPI's authorization middleware.\n\nAutomatically detects streaming requests (message/stream JSON-RPC method)\nand returns a StreamingResponse to enable real-time chunk delivery.\n\nArgs:\n request: FastAPI request object\n auth: Authentication tuple\n mcp_headers: MCP headers for context propagation\n\nReturns:\n JSON-RPC response or streaming response",
4315-
"operationId": "handle_a2a_jsonrpc_a2a_post",
4335+
"operationId": "handle_a2a_jsonrpc_a2a_get",
43164336
"responses": {
43174337
"200": {
43184338
"description": "Successful Response",
@@ -4330,7 +4350,7 @@
43304350
],
43314351
"summary": "Handle A2A Jsonrpc",
43324352
"description": "Handle A2A JSON-RPC requests following the A2A protocol specification.\n\nThis endpoint uses the DefaultRequestHandler from the A2A SDK to handle\nall JSON-RPC requests including message/send, message/stream, etc.\n\nThe A2A SDK application is created per-request to include authentication\ncontext while still leveraging FastAPI's authorization middleware.\n\nAutomatically detects streaming requests (message/stream JSON-RPC method)\nand returns a StreamingResponse to enable real-time chunk delivery.\n\nArgs:\n request: FastAPI request object\n auth: Authentication tuple\n mcp_headers: MCP headers for context propagation\n\nReturns:\n JSON-RPC response or streaming response",
4333-
"operationId": "handle_a2a_jsonrpc_a2a_post",
4353+
"operationId": "handle_a2a_jsonrpc_a2a_get",
43344354
"responses": {
43354355
"200": {
43364356
"description": "Successful Response",
@@ -5882,7 +5902,7 @@
58825902
"conversation_id"
58835903
],
58845904
"title": "ConversationDetails",
5885-
"description": "Model representing the details of a user conversation.\n\nAttributes:\n conversation_id: The conversation ID (UUID).\n created_at: When the conversation was created.\n last_message_at: When the last message was sent.\n message_count: Number of user messages in the conversation.\n last_used_model: The last model used for the conversation.\n last_used_provider: The provider of the last used model.\n topic_summary: The topic summary for the conversation.\n\nExample:\n ```python\n conversation = ConversationDetails(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\"\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n last_used_model=\"gemini/gemini-2.0-flash\",\n last_used_provider=\"gemini\",\n topic_summary=\"Openshift Microservices Deployment Strategies\",\n )\n ```"
5905+
"description": "Model representing the details of a user conversation.\n\nAttributes:\n conversation_id: The conversation ID (UUID).\n created_at: When the conversation was created.\n last_message_at: When the last message was sent.\n message_count: Number of user messages in the conversation.\n last_used_model: The last model used for the conversation.\n last_used_provider: The provider of the last used model.\n topic_summary: The topic summary for the conversation.\n\nExample:\n ```python\n conversation = ConversationDetails(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\",\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n last_used_model=\"gemini/gemini-2.0-flash\",\n last_used_provider=\"gemini\",\n topic_summary=\"Openshift Microservices Deployment Strategies\",\n )\n ```"
58865906
},
58875907
"ConversationHistoryConfiguration": {
58885908
"properties": {
@@ -7201,7 +7221,7 @@
72017221
},
72027222
"type": "object",
72037223
"title": "Authorization headers",
7204-
"description": "Headers to send to the MCP server. The map contains the header name and the path to a file containing the header value (secret). There are 2 special cases: 1. Usage of the kubernetes token in the header. To specify this use a string 'kubernetes' instead of the file path. 2. Usage of the client provided token in the header. To specify this use a string 'client' instead of the file path."
7224+
"description": "Headers to send to the MCP server. The map contains the header name and the path to a file containing the header value (secret). There are 3 special cases: 1. Usage of the kubernetes token in the header. To specify this use a string 'kubernetes' instead of the file path. 2. Usage of the client-provided token in the header. To specify this use a string 'client' instead of the file path. 3. Usage of the oauth token in the header. To specify this use a string 'oauth' instead of the file path. "
72057225
},
72067226
"timeout": {
72077227
"anyOf": [
@@ -7565,6 +7585,33 @@
75657585
"title": "PostgreSQLDatabaseConfiguration",
75667586
"description": "PostgreSQL database configuration.\n\nPostgreSQL database is used by Lightspeed Core Stack service for storing\ninformation about conversation IDs. It can also be leveraged to store\nconversation history and information about quota usage.\n\nUseful resources:\n\n- [Psycopg: connection classes](https://www.psycopg.org/psycopg3/docs/api/connections.html)\n- [PostgreSQL connection strings](https://www.connectionstrings.com/postgresql/)\n- [How to Use PostgreSQL in Python](https://www.freecodecamp.org/news/postgresql-in-python/)"
75677587
},
7588+
"PromptTooLongResponse": {
7589+
"properties": {
7590+
"status_code": {
7591+
"type": "integer",
7592+
"title": "Status Code"
7593+
},
7594+
"detail": {
7595+
"$ref": "#/components/schemas/DetailModel"
7596+
}
7597+
},
7598+
"type": "object",
7599+
"required": [
7600+
"status_code",
7601+
"detail"
7602+
],
7603+
"title": "PromptTooLongResponse",
7604+
"description": "413 Payload Too Large - Prompt is too long.",
7605+
"examples": [
7606+
{
7607+
"detail": {
7608+
"cause": "The prompt exceeds the maximum allowed length.",
7609+
"response": "Prompt is too long"
7610+
},
7611+
"label": "prompt too long"
7612+
}
7613+
]
7614+
},
75687615
"ProviderHealthStatus": {
75697616
"properties": {
75707617
"provider_id": {

0 commit comments

Comments
 (0)