Hi,
I am seeing a runtime error when using MAMBA_SSM on a B200GPU.
The same setup works correctly on H100, but on B200 the training job fails during the first forward pass with:
"torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device"
From the traceback, the failure appears to occur inside the custom CUDA extension used by "MAMBA_SSM", specifically:
"causal_conv1d_cuda.causal_conv1d_fwd"
PyTorch detects the GPU correctly and the training job starts, but it crashes as soon as that CUDA kernel is invoked. I also tested with a new fresh virtual environment and a clean reinstall of the relevant packages, but the same error persists on B200.
Does MAMBA_SSM or CAUSAL-CONV1D currently support B200? Is there any required build flag or special installation step for this GPU architecture?
Thanks!