-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
How to reproduce the behaviour
spaCy fails to use GPU on Cloud Run with an attached NVIDIA L4 GPU.
CUDA and the GPU are available and recognized by the system, but spaCy does not successfully execute inference on GPU.
When GPU usage is explicitly enabled in spaCy, inference fails with a device mismatch error.
Steps to reproduce:
- Run spaCy on Google Cloud Run with an attached NVIDIA L4 GPU.
- Confirm GPU availability (nvidia-smi works, CUDA is installed).
- Enable GPU usage in spaCy (e.g. prefer_gpu / require_gpu).
- Run transformer-based pipeline inference.
Error (after forcing GPU usage):
RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0
Minimal reproduction code:
import spacy
spacy.require_gpu() # or spacy.prefer_gpu()
nlp = spacy.load("en_core_web_trf")
texts = [
"This is a simple test sentence.",
"This is another sentence to reproduce the error."
]
docs = list(nlp.pipe(texts, batch_size=8))
print([doc.ents for doc in docs])
Expected behaviour:
- spaCy should successfully use the attached GPU for inference.
- Transformer pipeline should execute fully on CUDA without errors.
Actual behaviour:
- spaCy does not run successfully on GPU on Cloud Run with NVIDIA L4.
- When GPU usage is enabled, inference fails with a CPU/CUDA tensor mismatch error.
- Without forcing GPU usage, inference runs on CPU only.
Additional notes:
- The GPU is visible to the runtime (nvidia-smi works).
- CUDA is installed and available.
- The issue appears only in GPU mode.
- CPU-only execution works correctly.
- This suggests incomplete or inconsistent device placement when running on GPU.
Your Environment
Operating System:
Linux
Python Version Used:
3.13
spaCy Version Used:
3.8.0
Environment Information:
Running on Google Cloud Run with NVIDIA L4 GPU
Relevant installed packages:
- spacy>=3.8.0
- cupy-cuda12x>=13.4.0
- torch>=2.6.0
- transformers>=4.44.2,<4.58.0
- en_core_web_trf==3.8.0
Full traceback:
RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0