Skip to content

[RFC] MiniMax-M2.5 FP8 — Marathon Optimized (MI355X) #3192

@peymanr

Description

@peymanr

This issue tracks a series of 3 pull request(s) targeting ROCm/aiter.

Status: PRs being prepared — full description will be added shortly.

  • PR 1: [Perf][Kernel] Add decode buffer caches to eliminate per-step HIP malloc in fused_moe
  • PR 2: [Perf][Kernel] Add gfx950 1-stage ASM fast path for FP8 blockscale decode (ntok<=512)
  • PR 3: [Kernel][Perf] Add MiniMax-M2.5 GEMM and FMoE tuning configs for gfx950

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions