What you would like to be added?
- Add
koord-scheduler as a supported Grove scheduler backend so that Grove workloads can use Koordinator for gang scheduling.
- When this backend is selected, Grove PodGangs drive Koordinator's gang scheduling mechanism, and PodCliqueSet configurations incompatible with Koordinator (e.g. MNNVL) are rejected at admission.
Why is this needed?
Grove currently supports two scheduler backends: kai-scheduler and default-scheduler. Koordinator is a CNCF-hosted, widely deployed AI/ML-enhanced scheduler that is commonly found in production GPU clusters alongside Grove. Users in these environments today have no supported path to use Grove with Koordinator for gang scheduling.
Adding a koord-scheduler backend gives Grove users a third scheduling option and enables end-to-end gang scheduling (all-or-nothing pod placement) on clusters where Koordinator is already the scheduler of record.
What you would like to be added?
koord-scheduleras a supported Grove scheduler backend so that Grove workloads can use Koordinator for gang scheduling.Why is this needed?
Grove currently supports two scheduler backends:
kai-scheduleranddefault-scheduler. Koordinator is a CNCF-hosted, widely deployed AI/ML-enhanced scheduler that is commonly found in production GPU clusters alongside Grove. Users in these environments today have no supported path to use Grove with Koordinator for gang scheduling.Adding a
koord-schedulerbackend gives Grove users a third scheduling option and enables end-to-end gang scheduling (all-or-nothing pod placement) on clusters where Koordinator is already the scheduler of record.