Skip to content

en_core_web_trf does not run on GPU even when CUDA and GPU are detected #13917

@lizush23

Description

@lizush23

How to reproduce the behaviour

spaCy fails to use GPU on Cloud Run with an attached NVIDIA L4 GPU.
CUDA and the GPU are available and recognized by the system, but spaCy does not successfully execute inference on GPU.

When GPU usage is explicitly enabled in spaCy, inference fails with a device mismatch error.

Steps to reproduce:

  1. Run spaCy on Google Cloud Run with an attached NVIDIA L4 GPU.
  2. Confirm GPU availability (nvidia-smi works, CUDA is installed).
  3. Enable GPU usage in spaCy (e.g. prefer_gpu / require_gpu).
  4. Run transformer-based pipeline inference.

Error (after forcing GPU usage):
RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0

Minimal reproduction code:

import spacy

spacy.require_gpu() # or spacy.prefer_gpu()

nlp = spacy.load("en_core_web_trf")

texts = [
"This is a simple test sentence.",
"This is another sentence to reproduce the error."
]

docs = list(nlp.pipe(texts, batch_size=8))
print([doc.ents for doc in docs])

Expected behaviour:

  • spaCy should successfully use the attached GPU for inference.
  • Transformer pipeline should execute fully on CUDA without errors.

Actual behaviour:

  • spaCy does not run successfully on GPU on Cloud Run with NVIDIA L4.
  • When GPU usage is enabled, inference fails with a CPU/CUDA tensor mismatch error.
  • Without forcing GPU usage, inference runs on CPU only.

Additional notes:

  • The GPU is visible to the runtime (nvidia-smi works).
  • CUDA is installed and available.
  • The issue appears only in GPU mode.
  • CPU-only execution works correctly.
  • This suggests incomplete or inconsistent device placement when running on GPU.

Your Environment

Operating System:
Linux

Python Version Used:
3.13

spaCy Version Used:
3.8.0

Environment Information:
Running on Google Cloud Run with NVIDIA L4 GPU

Relevant installed packages:

  • spacy>=3.8.0
  • cupy-cuda12x>=13.4.0
  • torch>=2.6.0
  • transformers>=4.44.2,<4.58.0
  • en_core_web_trf==3.8.0

Full traceback:
RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions