en_core_web_trf does not run on GPU even when CUDA and GPU are detected

How to reproduce the behaviour

spaCy fails to use GPU on Cloud Run with an attached NVIDIA L4 GPU.
CUDA and the GPU are available and recognized by the system, but spaCy does not successfully execute inference on GPU.

When GPU usage is explicitly enabled in spaCy, inference fails with a device mismatch error.

Steps to reproduce:

1. Run spaCy on Google Cloud Run with an attached NVIDIA L4 GPU.
2. Confirm GPU availability (nvidia-smi works, CUDA is installed).
3. Enable GPU usage in spaCy (e.g. prefer_gpu / require_gpu).
4. Run transformer-based pipeline inference.

Error (after forcing GPU usage):
RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0

Minimal reproduction code:

import spacy

spacy.require_gpu()  # or spacy.prefer_gpu()

nlp = spacy.load("en_core_web_trf")

texts = [
    "This is a simple test sentence.",
    "This is another sentence to reproduce the error."
]

docs = list(nlp.pipe(texts, batch_size=8))
print([doc.ents for doc in docs])

Expected behaviour:
- spaCy should successfully use the attached GPU for inference.
- Transformer pipeline should execute fully on CUDA without errors.

Actual behaviour:
- spaCy does not run successfully on GPU on Cloud Run with NVIDIA L4.
- When GPU usage is enabled, inference fails with a CPU/CUDA tensor mismatch error.
- Without forcing GPU usage, inference runs on CPU only.

Additional notes:
- The GPU is visible to the runtime (nvidia-smi works).
- CUDA is installed and available.
- The issue appears only in GPU mode.
- CPU-only execution works correctly.
- This suggests incomplete or inconsistent device placement when running on GPU.

Your Environment

Operating System:
Linux

Python Version Used:
3.13

spaCy Version Used:
3.8.0

Environment Information:
Running on Google Cloud Run with NVIDIA L4 GPU

Relevant installed packages:
- spacy>=3.8.0
- cupy-cuda12x>=13.4.0
- torch>=2.6.0
- transformers>=4.44.2,<4.58.0
- en_core_web_trf==3.8.0

Full traceback:
RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

en_core_web_trf does not run on GPU even when CUDA and GPU are detected #13917

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

en_core_web_trf does not run on GPU even when CUDA and GPU are detected #13917

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions