Skip to content

Commit 9dbf66b

Browse files
committed
Support watsonx provider
added override model and provider to e2e tests patched conversation tests to include model and provider in the call
1 parent 23dc4d1 commit 9dbf66b

File tree

12 files changed

+388
-33
lines changed

12 files changed

+388
-33
lines changed

.github/workflows/e2e_tests.yaml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
fail-fast: false
1111
matrix:
1212
mode: ["server", "library"]
13-
environment: ["ci", "azure", "vertexai"]
13+
environment: ["ci", "azure", "vertexai", "watsonx"]
1414

1515
name: "E2E: ${{ matrix.mode }} mode / ${{ matrix.environment }}"
1616

@@ -200,6 +200,8 @@ jobs:
200200
VERTEX_AI_PROJECT: ${{ secrets.VERTEX_AI_PROJECT }}
201201
GOOGLE_APPLICATION_CREDENTIALS: ${{ env.GOOGLE_APPLICATION_CREDENTIALS }}
202202
GCP_KEYS_PATH: ${{ env.GCP_KEYS_PATH }}
203+
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
204+
WATSONX_API_KEY: ${{ secrets.WATSONX_API_KEY }}
203205
run: |
204206
# Debug: Check if environment variable is available for docker-compose
205207
echo "OPENAI_API_KEY is set: $([ -n "$OPENAI_API_KEY" ] && echo 'YES' || echo 'NO')"
@@ -226,6 +228,8 @@ jobs:
226228
VERTEX_AI_PROJECT: ${{ secrets.VERTEX_AI_PROJECT }}
227229
GOOGLE_APPLICATION_CREDENTIALS: ${{ env.GOOGLE_APPLICATION_CREDENTIALS }}
228230
GCP_KEYS_PATH: ${{ env.GCP_KEYS_PATH }}
231+
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
232+
WATSONX_API_KEY: ${{ secrets.WATSONX_API_KEY }}
229233
run: |
230234
echo "Starting service in library mode (1 container)"
231235
docker compose -f docker-compose-library.yaml up -d
@@ -256,6 +260,13 @@ jobs:
256260
exit 1
257261
}
258262
263+
# watsonx has a different convention than "<provider>/<model>"
264+
- name: Set watsonx test overrides
265+
if: matrix.environment == 'watsonx'
266+
run: |
267+
echo "E2E_DEFAULT_MODEL_OVERRIDE=watsonx/watsonx/meta-llama/llama-3-3-70b-instruct" >> $GITHUB_ENV
268+
echo "E2E_DEFAULT_PROVIDER_OVERRIDE=watsonx" >> $GITHUB_ENV
269+
259270
- name: Run e2e tests
260271
env:
261272
TERM: xterm-256color

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ Lightspeed Core Stack is based on the FastAPI framework (Uvicorn). The service i
122122
| OpenAI | https://platform.openai.com |
123123
| Azure OpenAI | https://azure.microsoft.com/en-us/products/ai-services/openai-service |
124124
| Google VertexAI| https://cloud.google.com/vertex-ai |
125+
| IBM WatsonX | https://www.ibm.com/products/watsonx |
125126
| RHOAI (vLLM) | See tests/e2e-prow/rhoai/configs/run.yaml |
126127
| RHEL AI (vLLM) | See tests/e2e/configs/run-rhelai.yaml |
127128

@@ -177,6 +178,7 @@ __Note__: Support for individual models is dependent on the specific inference p
177178
| Azure | gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3-mini, o4-mini | Yes | remote::azure | [1](examples/azure-run.yaml) |
178179
| Azure | o1, o1-mini | No | remote::azure | |
179180
| VertexAI | google/gemini-2.0-flash, google/gemini-2.5-flash, google/gemini-2.5-pro [^1] | Yes | remote::vertexai | [1](examples/vertexai-run.yaml) |
181+
| WatsonX | meta-llama/llama-3-3-70b-instruct | Yes | remote::watsonx | [1](examples/watsonx-run.yaml) |
180182

181183
[^1]: List of models is limited by design in llama-stack, future versions will probably allow to use more models (see [here](https://github.com/llamastack/llama-stack/blob/release-0.3.x/llama_stack/providers/remote/inference/vertexai/vertexai.py#L54))
182184

docker-compose-library.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ services:
3434
- GOOGLE_APPLICATION_CREDENTIALS=${GOOGLE_APPLICATION_CREDENTIALS:-}
3535
- VERTEX_AI_PROJECT=${VERTEX_AI_PROJECT:-}
3636
- VERTEX_AI_LOCATION=${VERTEX_AI_LOCATION:-}
37+
# WatsonX
38+
- WATSONX_BASE_URL=${WATSONX_BASE_URL:-}
39+
- WATSONX_PROJECT_ID=${WATSONX_PROJECT_ID:-}
40+
- WATSONX_API_KEY=${WATSONX_API_KEY:-}
3741
# Enable debug logging if needed
3842
- LLAMA_STACK_LOGGING=${LLAMA_STACK_LOGGING:-}
3943
healthcheck:

docker-compose.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ services:
3232
- GOOGLE_APPLICATION_CREDENTIALS=${GOOGLE_APPLICATION_CREDENTIALS:-}
3333
- VERTEX_AI_PROJECT=${VERTEX_AI_PROJECT:-}
3434
- VERTEX_AI_LOCATION=${VERTEX_AI_LOCATION:-}
35+
# WatsonX
36+
- WATSONX_BASE_URL=${WATSONX_BASE_URL:-}
37+
- WATSONX_PROJECT_ID=${WATSONX_PROJECT_ID:-}
38+
- WATSONX_API_KEY=${WATSONX_API_KEY:-}
3539
# Enable debug logging if needed
3640
- LLAMA_STACK_LOGGING=${LLAMA_STACK_LOGGING:-}
3741
networks:

docs/providers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ The tables below summarize each provider category, containing the following atri
5555
| tgi | remote | `huggingface_hub`, `aiohttp` ||
5656
| together | remote | `together` ||
5757
| vertexai | remote | `google-auth` ||
58-
| watsonx | remote | `ibm_watsonx_ai` | |
58+
| watsonx | remote | `litellm` | |
5959

6060
Red Hat providers:
6161

examples/watsonx-run.yaml

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
version: 2
2+
3+
apis:
4+
- agents
5+
- batches
6+
- datasetio
7+
- eval
8+
- files
9+
- inference
10+
- safety
11+
- scoring
12+
- telemetry
13+
- tool_runtime
14+
- vector_io
15+
16+
benchmarks: []
17+
conversations_store:
18+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/storage/conversations.db}
19+
type: sqlite
20+
datasets: []
21+
image_name: starter
22+
# external_providers_dir: /opt/app-root/src/.llama/providers.d
23+
inference_store:
24+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/storage/inference-store.db}
25+
type: sqlite
26+
metadata_store:
27+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/storage/registry.db}
28+
type: sqlite
29+
30+
providers:
31+
inference:
32+
- provider_id: watsonx
33+
provider_type: remote::watsonx
34+
config:
35+
url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
36+
api_key: ${env.WATSONX_API_KEY:=key-not-set}
37+
project_id: ${env.WATSONX_PROJECT_ID:=project-not-set}
38+
timeout: 1200
39+
- config: {}
40+
provider_id: sentence-transformers
41+
provider_type: inline::sentence-transformers
42+
files:
43+
- config:
44+
metadata_store:
45+
table_name: files_metadata
46+
backend: sql_default
47+
storage_dir: ${env.SQLITE_STORE_DIR:=~/.llama/storage/files}
48+
provider_id: meta-reference-files
49+
provider_type: inline::localfs
50+
safety: [] # WARNING: Shields disabled due to infinite loop issue with LLM calls
51+
# - config:
52+
# excluded_categories: []
53+
# provider_id: llama-guard
54+
# provider_type: inline::llama-guard
55+
scoring:
56+
- provider_id: basic
57+
provider_type: inline::basic
58+
config: {}
59+
- provider_id: llm-as-judge
60+
provider_type: inline::llm-as-judge
61+
config: {}
62+
- provider_id: braintrust
63+
provider_type: inline::braintrust
64+
config:
65+
openai_api_key: '********'
66+
tool_runtime:
67+
- config: {} # Enable the RAG tool
68+
provider_id: rag-runtime
69+
provider_type: inline::rag-runtime
70+
vector_io:
71+
- config: # Define the storage backend for RAG
72+
persistence:
73+
namespace: vector_io::faiss
74+
backend: kv_default
75+
provider_id: faiss
76+
provider_type: inline::faiss
77+
agents:
78+
- config:
79+
persistence:
80+
agent_state:
81+
namespace: agents_state
82+
backend: kv_default
83+
responses:
84+
table_name: agents_responses
85+
backend: sql_default
86+
provider_id: meta-reference
87+
provider_type: inline::meta-reference
88+
batches:
89+
- config:
90+
kvstore:
91+
namespace: batches_store
92+
backend: kv_default
93+
provider_id: reference
94+
provider_type: inline::reference
95+
datasetio:
96+
- config:
97+
kvstore:
98+
namespace: huggingface_datasetio
99+
backend: kv_default
100+
provider_id: huggingface
101+
provider_type: remote::huggingface
102+
- config:
103+
kvstore:
104+
namespace: localfs_datasetio
105+
backend: kv_default
106+
provider_id: localfs
107+
provider_type: inline::localfs
108+
eval:
109+
- config:
110+
kvstore:
111+
namespace: eval_store
112+
backend: kv_default
113+
provider_id: meta-reference
114+
provider_type: inline::meta-reference
115+
scoring_fns: []
116+
telemetry:
117+
enabled: true
118+
server:
119+
port: 8321
120+
storage:
121+
backends:
122+
kv_default: # Define the storage backend type for RAG, in this case registry and RAG are unified i.e. information on registered resources (e.g. models, vector_stores) are saved together with the RAG chunks
123+
type: kv_sqlite
124+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/storage/rag/kv_store.db}
125+
sql_default:
126+
type: sql_sqlite
127+
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/storage/sql_store.db}
128+
stores:
129+
metadata:
130+
namespace: registry
131+
backend: kv_default
132+
inference:
133+
table_name: inference_store
134+
backend: sql_default
135+
max_write_queue_size: 10000
136+
num_writers: 4
137+
conversations:
138+
table_name: openai_conversations
139+
backend: sql_default
140+
prompts:
141+
namespace: prompts
142+
backend: kv_default
143+
registered_resources:
144+
models:
145+
- model_id: custom-watsonx-model
146+
provider_id: watsonx
147+
model_type: llm
148+
provider_model_id: watsonx/meta-llama/llama-3-3-70b-instruct
149+
shields: [] # WARNING: Shields disabled due to infinite loop issue with LLM calls
150+
vector_dbs: []
151+
datasets: []
152+
scoring_fns: []
153+
benchmarks: []
154+
tool_groups:
155+
- toolgroup_id: builtin::rag # Register the RAG tool
156+
provider_id: rag-runtime
157+
vector_stores:
158+
default_provider_id: faiss
159+
default_embedding_model: # Define the default embedding model for RAG
160+
provider_id: sentence-transformers
161+
model_id: nomic-ai/nomic-embed-text-v1.5

src/app/endpoints/query.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@
2020
Toolgroup,
2121
ToolgroupAgentToolGroupWithArgs,
2222
)
23+
from llama_stack_client.types.alpha.tool_execution_step import ToolExecutionStep
2324
from llama_stack_client.types.model_list_response import ModelListResponse
2425
from llama_stack_client.types.shared.interleaved_content_item import TextContentItem
25-
from llama_stack_client.types.alpha.tool_execution_step import ToolExecutionStep
2626
from sqlalchemy.exc import SQLAlchemyError
2727

2828
import constants
@@ -41,8 +41,8 @@
4141
ForbiddenResponse,
4242
InternalServerErrorResponse,
4343
NotFoundResponse,
44-
QueryResponse,
4544
PromptTooLongResponse,
45+
QueryResponse,
4646
QuotaExceededResponse,
4747
ReferencedDocument,
4848
ServiceUnavailableResponse,
@@ -540,7 +540,8 @@ def select_model_and_provider_id(
540540
logger.debug("Searching for model: %s, provider: %s", model_id, provider_id)
541541
# TODO: Create sepparate validation of provider
542542
if not any(
543-
m.identifier == llama_stack_model_id and m.provider_id == provider_id
543+
m.identifier in (llama_stack_model_id, model_id)
544+
and m.provider_id == provider_id
544545
for m in models
545546
):
546547
message = f"Model {model_id} from provider {provider_id} not found in available models"

0 commit comments

Comments
 (0)