-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
A clear and concise description of what the bug is. When running Kimi-K2.5 on a 4 node cluster it will crash after a few requests. Typically three nodes go to 0% CPU and one goes to 100%.
To Reproduce
Steps to reproduce the behavior:
- Fire up a 4-node Kimi-K2.5 cluster
- Build up some context
- Watch it crash
Expected behavior
Two-node K2.5 clusters are super stable, so I expected a 4-node cluster to be stable and faster.
Actual behavior
The cluster fails, typically with one node at 100% CPU and the rest at 0%
Environment
- macOS Version: macOS 26.3 (25D125) Darwin 25.3.0
- EXO Version: 1.0.68
- Hardware:
- Device 1: Mac Studio M3 Ultra 512GB RAM
- Device 2: Mac Studio M3 Ultra 512GB RAM
- Device 3: Mac Studio M3 Ultra 512GB RAM
- Device 4: Mac Studio M3 Ultra 512GB RAM
- Interconnection:
- Thunderbolt 5 fully cross-wired
- 10GbE Ethernet between all devices
Additional context
N/A
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working