Fixes and Enhancements for Mamba Inference and Reference Implementations by mohiuddin-khan-shiam · Pull Request #743 · state-spaces/mamba

mohiuddin-khan-shiam · 2025-06-08T19:02:58Z

This pull request addresses several bugs and limitations within the Mamba codebase, primarily aimed at improving inference robustness in the Mamba2 module and increasing the accuracy of reference implementations.

Key changes include:

In mamba_ssm/modules/mamba2.py:
- Resolved an issue in _get_states_from_cache to correctly handle dynamic batch sizes during inference, ensuring proper state re-initialization when batch sizes change.
- Removed the batch == 1 assertion in the forward method for variable-length sequence inference, enabling batched processing for these inputs.
- Updated the fallback path in the step method to support ngroups > 1, allowing grouped SSM inference even if Triton kernels are not available.
In mamba_ssm/ops/selective_scan_interface.py:
- Added optional RMS Normalization for B, C, and delta tensors within mamba_inner_ref to better match the main MambaInnerFn's behavior and improve numerical consistency.
- Corrected a shape comment in selective_scan_ref for clarity.
In mamba_ssm/models/mixer_seq_simple.py:
- Removed a redundant comment in the _init_weights function.
In mamba_ssm/utils/hf.py:
- Addressed a bug in load_state_dict_hf to ensure correct dtype conversion and device placement when loading Hugging Face model weights.

These modifications enhance the stability, flexibility, and correctness of the Mamba library.

This pull request addresses several bugs and limitations within the Mamba codebase, primarily aimed at improving inference robustness in the Mamba2 module and increasing the accuracy of reference implementations. Key changes include: In mamba_ssm/modules/mamba2.py: Resolved an issue in _get_states_from_cache to correctly handle dynamic batch sizes during inference, ensuring proper state re-initialization when batch sizes change. Removed the batch == 1 assertion in the forward method for variable-length sequence inference, enabling batched processing for these inputs. Updated the fallback path in the step method to support ngroups > 1, allowing grouped SSM inference even if Triton kernels are not available. In mamba_ssm/ops/selective_scan_interface.py: Added optional RMS Normalization for B, C, and delta tensors within mamba_inner_ref to better match the main MambaInnerFn's behavior and improve numerical consistency. Corrected a shape comment in selective_scan_ref for clarity. In mamba_ssm/models/mixer_seq_simple.py: Removed a redundant comment in the _init_weights function. In mamba_ssm/utils/hf.py: Addressed a bug in load_state_dict_hf to ensure correct dtype conversion and device placement when loading Hugging Face model weights. These modifications enhance the stability, flexibility, and correctness of the Mamba library.

zju-sqs · 2025-10-21T01:58:26Z

This code only support one step decoding now (in selective_state_update.py) which makes it hard to train streaming model, eg. chunk wise asr, it needs to pass ssm_states between chunks and update through [bs,seq_len,hidden_dim] tensor,not [[bs,1,hidden_dim].Does your code suppot seq_len >=2 updating ssm_states?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes and Enhancements for Mamba Inference and Reference Implementations#743

Fixes and Enhancements for Mamba Inference and Reference Implementations#743
mohiuddin-khan-shiam wants to merge 1 commit intostate-spaces:mainfrom
mohiuddin-khan-shiam:main

mohiuddin-khan-shiam commented Jun 8, 2025

Uh oh!

zju-sqs commented Oct 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mohiuddin-khan-shiam commented Jun 8, 2025

Uh oh!

zju-sqs commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zju-sqs commented Oct 21, 2025 •

edited

Loading