[ROCm] please install 'torchcodec'

### Describe the bug

Datasets library is widely used by many Python packages. Naturally, it is a requirement on many platforms. This includes vLLM for ROCm. During audio dataset tests, there is an exception triggered:
```python
    def decode_example(
        self, value: dict, token_per_repo_id: Optional[dict[str, Union[str, bool, None]]] = None
    ) -> "AudioDecoder":
        """Decode example audio file into audio data.

        Args:
            value (`dict`):
                A dictionary with keys:

                - `path`: String with relative audio file path.
                - `bytes`: Bytes of the audio file.
            token_per_repo_id (`dict`, *optional*):
                To access and decode
                audio files from private repositories on the Hub, you can pass
                a dictionary repo_id (`str`) -> token (`bool` or `str`)

        Returns:
            `torchcodec.decoders.AudioDecoder`
        """
        if config.TORCHCODEC_AVAILABLE:
            from ._torchcodec import AudioDecoder
        else:
>           raise ImportError("To support decoding audio data, please install 'torchcodec'.")
E           ImportError: To support decoding audio data, please install 'torchcodec'.
```

At the same time, `torchcodec` cannot be installed on ROCm, because Its GPU acceleration uses NVIDIA's NVDEC (hardware decoder), which is NVIDIA-specific. Therefore, code paths that call this block trigger errors on ROCm. Can you add an alternative package there as fallback instead of an ImportError?

### Steps to reproduce the bug

On a machine with MI300/MI325/MI355:

```bash
pytest -s -v tests/entrypoints/openai/correctness/test_transcription_api_correctness.py::test_wer_correctness[12.74498-D4nt3/esb-datasets-earnings22-validation-tiny-filtered-openai/whisper-large-v3]
```

### Expected behavior

```log
_________________________________________________ test_wer_correctness[12.74498-D4nt3/esb-datasets-earnings22-validation-tiny-filtered-openai/whisper-large-v3] ________________________________________[383/535$

model_name = 'openai/whisper-large-v3', dataset_repo = 'D4nt3/esb-datasets-earnings22-validation-tiny-filtered', expected_wer = 12.74498, n_examples = -1, max_concurrent_request = None

    @pytest.mark.parametrize("model_name", ["openai/whisper-large-v3"])
    # Original dataset is 20GB+ in size, hence we use a pre-filtered slice.
    @pytest.mark.parametrize(
        "dataset_repo", ["D4nt3/esb-datasets-earnings22-validation-tiny-filtered"]
    )
    # NOTE: Expected WER measured with equivalent hf.transformers args:
    # whisper-large-v3 + esb-datasets-earnings22-validation-tiny-filtered.
    @pytest.mark.parametrize("expected_wer", [12.744980])
    def test_wer_correctness(
        model_name, dataset_repo, expected_wer, n_examples=-1, max_concurrent_request=None
    ):
        # TODO refactor to use `ASRDataset`
        with RemoteOpenAIServer(model_name, ["--enforce-eager"]) as remote_server:
>           dataset = load_hf_dataset(dataset_repo)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

tests/entrypoints/openai/correctness/test_transcription_api_correctness.py:160:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/entrypoints/openai/correctness/test_transcription_api_correctness.py:111: in load_hf_dataset
    if "duration_ms" not in dataset[0]:
                            ^^^^^^^^^^
/usr/local/lib/python3.12/dist-packages/datasets/arrow_dataset.py:2876: in __getitem__
    return self._getitem(key)
           ^^^^^^^^^^^^^^^^^^
/usr/local/lib/python3.12/dist-packages/datasets/arrow_dataset.py:2858: in _getitem
    formatted_output = format_table(
/usr/local/lib/python3.12/dist-packages/datasets/formatting/formatting.py:658: in format_table
    return formatter(pa_table, query_type=query_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/local/lib/python3.12/dist-packages/datasets/formatting/formatting.py:411: in __call__
    return self.format_row(pa_table)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/local/lib/python3.12/dist-packages/datasets/formatting/formatting.py:460: in format_row
    row = self.python_features_decoder.decode_row(row)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/local/lib/python3.12/dist-packages/datasets/formatting/formatting.py:224: in decode_row
    return self.features.decode_example(row, token_per_repo_id=self.token_per_repo_id) if self.features else row
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/local/lib/python3.12/dist-packages/datasets/features/features.py:2111: in decode_example
    column_name: decode_nested_example(feature, value, token_per_repo_id=token_per_repo_id)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/local/lib/python3.12/dist-packages/datasets/features/features.py:1419: in decode_nested_example
    return schema.decode_example(obj, token_per_repo_id=token_per_repo_id) if obj is not None else None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

### Environment info

- `datasets` version: 4.4.2
- Platform: Linux-5.15.0-161-generic-x86_64-with-glibc2.35
- Python version: 3.12.12
- `huggingface_hub` version: 0.36.0
- PyArrow version: 22.0.0
- Pandas version: 2.3.3
- `fsspec` version: 2025.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] please install 'torchcodec' #7914

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ROCm] please install 'torchcodec' #7914

Description

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions