-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Labels
Description
🚀 The feature, motivation and pitch
Background
I am very interested in the Unified Cache Management (UCM) project. Currently, it provides an excellent way to persist and reuse KV Cache to speed up LLM inference. In the current landscape of diverse hardware, adapting such cache management logic to domestic GPUs (like Iluvatar CoreX / 天数智芯) is becoming increasingly important.
Question
I would like to know if there are any plans or architectural considerations for supporting non-NVIDIA backends. Specifically:
- Hardware Abstraction: Does the current implementation of UCM heavily rely on NVIDIA-specific features (e.g., CUDA VMM API, specific NVLink behaviors)?
- Framework Dependency: Does UCM require a specific version of vLLM or other engines that are strictly tied to CUDA?
- Porting Effort: In your opinion, what are the most critical modules that need to be rewritten or abstracted to support the Iluvatar CoreX software stack (which uses the DeepLink/CoreX SDK)?
I have access to Iluvatar CoreX hardware and would love to hear your thoughts on the feasibility of this adaptation.
Alternatives
No response
Additional context
No response