Skip to content

[Feature]:Support for non-NVIDIA backends (e.g., Iluvatar CoreX ) #667

@SawanoHao

Description

@SawanoHao

🚀 The feature, motivation and pitch

Background

I am very interested in the Unified Cache Management (UCM) project. Currently, it provides an excellent way to persist and reuse KV Cache to speed up LLM inference. In the current landscape of diverse hardware, adapting such cache management logic to domestic GPUs (like Iluvatar CoreX / 天数智芯) is becoming increasingly important.

Question

I would like to know if there are any plans or architectural considerations for supporting non-NVIDIA backends. Specifically:

  1. Hardware Abstraction: Does the current implementation of UCM heavily rely on NVIDIA-specific features (e.g., CUDA VMM API, specific NVLink behaviors)?
  2. Framework Dependency: Does UCM require a specific version of vLLM or other engines that are strictly tied to CUDA?
  3. Porting Effort: In your opinion, what are the most critical modules that need to be rewritten or abstracted to support the Iluvatar CoreX software stack (which uses the DeepLink/CoreX SDK)?

I have access to Iluvatar CoreX hardware and would love to hear your thoughts on the feasibility of this adaptation.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions