Skip to content

[BUG] Kimi-K2.5 fails on 4 node cluster #1764

@phil-brass

Description

@phil-brass

Describe the bug

A clear and concise description of what the bug is. When running Kimi-K2.5 on a 4 node cluster it will crash after a few requests. Typically three nodes go to 0% CPU and one goes to 100%.

To Reproduce

Steps to reproduce the behavior:

  1. Fire up a 4-node Kimi-K2.5 cluster
  2. Build up some context
  3. Watch it crash

Expected behavior

Two-node K2.5 clusters are super stable, so I expected a 4-node cluster to be stable and faster.

Actual behavior

The cluster fails, typically with one node at 100% CPU and the rest at 0%

Environment

  • macOS Version: macOS 26.3 (25D125) Darwin 25.3.0
  • EXO Version: 1.0.68
  • Hardware:
    • Device 1: Mac Studio M3 Ultra 512GB RAM
    • Device 2: Mac Studio M3 Ultra 512GB RAM
    • Device 3: Mac Studio M3 Ultra 512GB RAM
    • Device 4: Mac Studio M3 Ultra 512GB RAM
  • Interconnection:
    • Thunderbolt 5 fully cross-wired
    • 10GbE Ethernet between all devices

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions