Skip to content

Commit 60de883

Browse files
committed
Support watsonx provider
1 parent d649176 commit 60de883

File tree

13 files changed

+441
-38
lines changed

13 files changed

+441
-38
lines changed

.github/workflows/e2e_tests_providers.yaml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
fail-fast: false
1414
matrix:
1515
mode: ["server", "library"]
16-
environment: ["azure", "vertexai"]
16+
environment: ["azure", "vertexai", "watsonx"]
1717

1818
name: "E2E: ${{ matrix.mode }} mode / ${{ matrix.environment }}"
1919

@@ -203,6 +203,9 @@ jobs:
203203
VERTEX_AI_PROJECT: ${{ secrets.VERTEX_AI_PROJECT }}
204204
GOOGLE_APPLICATION_CREDENTIALS: ${{ env.GOOGLE_APPLICATION_CREDENTIALS }}
205205
GCP_KEYS_PATH: ${{ env.GCP_KEYS_PATH }}
206+
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
207+
WATSONX_API_KEY: ${{ secrets.WATSONX_API_KEY }}
208+
206209
run: |
207210
# Debug: Check if environment variable is available for docker-compose
208211
echo "OPENAI_API_KEY is set: $([ -n "$OPENAI_API_KEY" ] && echo 'YES' || echo 'NO')"
@@ -229,6 +232,9 @@ jobs:
229232
VERTEX_AI_PROJECT: ${{ secrets.VERTEX_AI_PROJECT }}
230233
GOOGLE_APPLICATION_CREDENTIALS: ${{ env.GOOGLE_APPLICATION_CREDENTIALS }}
231234
GCP_KEYS_PATH: ${{ env.GCP_KEYS_PATH }}
235+
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
236+
WATSONX_API_KEY: ${{ secrets.WATSONX_API_KEY }}
237+
232238
run: |
233239
echo "Starting service in library mode (1 container)"
234240
docker compose -f docker-compose-library.yaml up -d
@@ -259,6 +265,13 @@ jobs:
259265
exit 1
260266
}
261267
268+
# watsonx has a different convention than "<provider>/<model>"
269+
- name: Set watsonx test overrides
270+
if: matrix.environment == 'watsonx'
271+
run: |
272+
echo "E2E_DEFAULT_MODEL_OVERRIDE=watsonx/watsonx/meta-llama/llama-3-3-70b-instruct" >> $GITHUB_ENV
273+
echo "E2E_DEFAULT_PROVIDER_OVERRIDE=watsonx" >> $GITHUB_ENV
274+
262275
- name: Run e2e tests
263276
env:
264277
TERM: xterm-256color

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ Lightspeed Core Stack is based on the FastAPI framework (Uvicorn). The service i
123123
| OpenAI | https://platform.openai.com |
124124
| Azure OpenAI | https://azure.microsoft.com/en-us/products/ai-services/openai-service |
125125
| Google VertexAI| https://cloud.google.com/vertex-ai |
126+
| IBM WatsonX | https://www.ibm.com/products/watsonx |
126127
| RHOAI (vLLM) | See tests/e2e-prow/rhoai/configs/run.yaml |
127128
| RHEL AI (vLLM) | See tests/e2e/configs/run-rhelai.yaml |
128129

@@ -178,6 +179,7 @@ __Note__: Support for individual models is dependent on the specific inference p
178179
| Azure | gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3-mini, o4-mini | Yes | remote::azure | [1](examples/azure-run.yaml) |
179180
| Azure | o1, o1-mini | No | remote::azure | |
180181
| VertexAI | google/gemini-2.0-flash, google/gemini-2.5-flash, google/gemini-2.5-pro [^1] | Yes | remote::vertexai | [1](examples/vertexai-run.yaml) |
182+
| WatsonX | meta-llama/llama-3-3-70b-instruct | Yes | remote::watsonx | [1](examples/watsonx-run.yaml) |
181183

182184
[^1]: List of models is limited by design in llama-stack, future versions will probably allow to use more models (see [here](https://github.com/llamastack/llama-stack/blob/release-0.3.x/llama_stack/providers/remote/inference/vertexai/vertexai.py#L54))
183185

docker-compose-library.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ services:
3434
- GOOGLE_APPLICATION_CREDENTIALS=${GOOGLE_APPLICATION_CREDENTIALS:-}
3535
- VERTEX_AI_PROJECT=${VERTEX_AI_PROJECT:-}
3636
- VERTEX_AI_LOCATION=${VERTEX_AI_LOCATION:-}
37+
# WatsonX
38+
- WATSONX_BASE_URL=${WATSONX_BASE_URL:-}
39+
- WATSONX_PROJECT_ID=${WATSONX_PROJECT_ID:-}
40+
- WATSONX_API_KEY=${WATSONX_API_KEY:-}
3741
# Enable debug logging if needed
3842
- LLAMA_STACK_LOGGING=${LLAMA_STACK_LOGGING:-}
3943
healthcheck:

docker-compose.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ services:
3232
- GOOGLE_APPLICATION_CREDENTIALS=${GOOGLE_APPLICATION_CREDENTIALS:-}
3333
- VERTEX_AI_PROJECT=${VERTEX_AI_PROJECT:-}
3434
- VERTEX_AI_LOCATION=${VERTEX_AI_LOCATION:-}
35+
# WatsonX
36+
- WATSONX_BASE_URL=${WATSONX_BASE_URL:-}
37+
- WATSONX_PROJECT_ID=${WATSONX_PROJECT_ID:-}
38+
- WATSONX_API_KEY=${WATSONX_API_KEY:-}
3539
# Enable debug logging if needed
3640
- LLAMA_STACK_LOGGING=${LLAMA_STACK_LOGGING:-}
3741
networks:

docs/providers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ The tables below summarize each provider category, containing the following atri
5555
| tgi | remote | `huggingface_hub`, `aiohttp` ||
5656
| together | remote | `together` ||
5757
| vertexai | remote | `google-auth` ||
58-
| watsonx | remote | `ibm_watsonx_ai` | |
58+
| watsonx | remote | `litellm` | |
5959

6060
Red Hat providers:
6161

examples/watsonx-run.yaml

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
version: 2
2+
3+
apis:
4+
- agents
5+
- batches
6+
- datasetio
7+
- eval
8+
- files
9+
- inference
10+
- safety
11+
- scoring
12+
- telemetry
13+
- tool_runtime
14+
- vector_io
15+
16+
benchmarks: []
17+
conversations_store:
18+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/storage/conversations.db}
19+
type: sqlite
20+
datasets: []
21+
image_name: starter
22+
# external_providers_dir: /opt/app-root/src/.llama/providers.d
23+
inference_store:
24+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/storage/inference-store.db}
25+
type: sqlite
26+
metadata_store:
27+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/storage/registry.db}
28+
type: sqlite
29+
30+
providers:
31+
inference:
32+
- provider_id: watsonx
33+
provider_type: remote::watsonx
34+
config:
35+
url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
36+
api_key: ${env.WATSONX_API_KEY:=key-not-set}
37+
project_id: ${env.WATSONX_PROJECT_ID:=project-not-set}
38+
timeout: 1200
39+
- provider_id: openai
40+
provider_type: remote::openai
41+
config:
42+
api_key: ${env.OPENAI_API_KEY}
43+
- config: {}
44+
provider_id: sentence-transformers
45+
provider_type: inline::sentence-transformers
46+
files:
47+
- config:
48+
metadata_store:
49+
table_name: files_metadata
50+
backend: sql_default
51+
storage_dir: ${env.SQLITE_STORE_DIR:=~/.llama/storage/files}
52+
provider_id: meta-reference-files
53+
provider_type: inline::localfs
54+
safety:
55+
- config:
56+
excluded_categories: []
57+
provider_id: llama-guard
58+
provider_type: inline::llama-guard
59+
scoring:
60+
- provider_id: basic
61+
provider_type: inline::basic
62+
config: {}
63+
- provider_id: llm-as-judge
64+
provider_type: inline::llm-as-judge
65+
config: {}
66+
- provider_id: braintrust
67+
provider_type: inline::braintrust
68+
config:
69+
openai_api_key: '********'
70+
tool_runtime:
71+
- config: {} # Enable the RAG tool
72+
provider_id: rag-runtime
73+
provider_type: inline::rag-runtime
74+
vector_io:
75+
- config: # Define the storage backend for RAG
76+
persistence:
77+
namespace: vector_io::faiss
78+
backend: kv_default
79+
provider_id: faiss
80+
provider_type: inline::faiss
81+
agents:
82+
- config:
83+
persistence:
84+
agent_state:
85+
namespace: agents_state
86+
backend: kv_default
87+
responses:
88+
table_name: agents_responses
89+
backend: sql_default
90+
provider_id: meta-reference
91+
provider_type: inline::meta-reference
92+
batches:
93+
- config:
94+
kvstore:
95+
namespace: batches_store
96+
backend: kv_default
97+
provider_id: reference
98+
provider_type: inline::reference
99+
datasetio:
100+
- config:
101+
kvstore:
102+
namespace: huggingface_datasetio
103+
backend: kv_default
104+
provider_id: huggingface
105+
provider_type: remote::huggingface
106+
- config:
107+
kvstore:
108+
namespace: localfs_datasetio
109+
backend: kv_default
110+
provider_id: localfs
111+
provider_type: inline::localfs
112+
eval:
113+
- config:
114+
kvstore:
115+
namespace: eval_store
116+
backend: kv_default
117+
provider_id: meta-reference
118+
provider_type: inline::meta-reference
119+
scoring_fns: []
120+
telemetry:
121+
enabled: true
122+
server:
123+
port: 8321
124+
storage:
125+
backends:
126+
kv_default: # Define the storage backend type for RAG, in this case registry and RAG are unified i.e. information on registered resources (e.g. models, vector_stores) are saved together with the RAG chunks
127+
type: kv_sqlite
128+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/storage/rag/kv_store.db}
129+
sql_default:
130+
type: sql_sqlite
131+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/storage/sql_store.db}
132+
stores:
133+
metadata:
134+
namespace: registry
135+
backend: kv_default
136+
inference:
137+
table_name: inference_store
138+
backend: sql_default
139+
max_write_queue_size: 10000
140+
num_writers: 4
141+
conversations:
142+
table_name: openai_conversations
143+
backend: sql_default
144+
prompts:
145+
namespace: prompts
146+
backend: kv_default
147+
registered_resources:
148+
models:
149+
- model_id: custom-watsonx-model
150+
provider_id: watsonx
151+
model_type: llm
152+
provider_model_id: watsonx/meta-llama/llama-3-3-70b-instruct
153+
shields:
154+
- shield_id: llama-guard
155+
provider_id: llama-guard
156+
provider_shield_id: openai/gpt-4o-mini
157+
vector_dbs: []
158+
datasets: []
159+
scoring_fns: []
160+
benchmarks: []
161+
tool_groups:
162+
- toolgroup_id: builtin::rag # Register the RAG tool
163+
provider_id: rag-runtime
164+
vector_stores:
165+
default_provider_id: faiss
166+
default_embedding_model: # Define the default embedding model for RAG
167+
provider_id: sentence-transformers
168+
model_id: nomic-ai/nomic-embed-text-v1.5

src/app/endpoints/query.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@
2020
Toolgroup,
2121
ToolgroupAgentToolGroupWithArgs,
2222
)
23+
from llama_stack_client.types.alpha.tool_execution_step import ToolExecutionStep
2324
from llama_stack_client.types.model_list_response import ModelListResponse
2425
from llama_stack_client.types.shared.interleaved_content_item import TextContentItem
25-
from llama_stack_client.types.alpha.tool_execution_step import ToolExecutionStep
2626
from sqlalchemy.exc import SQLAlchemyError
2727

2828
import constants
@@ -41,8 +41,8 @@
4141
ForbiddenResponse,
4242
InternalServerErrorResponse,
4343
NotFoundResponse,
44-
QueryResponse,
4544
PromptTooLongResponse,
45+
QueryResponse,
4646
QuotaExceededResponse,
4747
ReferencedDocument,
4848
ServiceUnavailableResponse,
@@ -543,7 +543,8 @@ def select_model_and_provider_id(
543543
logger.debug("Searching for model: %s, provider: %s", model_id, provider_id)
544544
# TODO: Create sepparate validation of provider
545545
if not any(
546-
m.identifier == llama_stack_model_id and m.provider_id == provider_id
546+
m.identifier in (llama_stack_model_id, model_id)
547+
and m.provider_id == provider_id
547548
for m in models
548549
):
549550
message = f"Model {model_id} from provider {provider_id} not found in available models"

0 commit comments

Comments
 (0)