MedNeXt (versions 1 and 2) experiments, VRAM and runtime reporting

Hi nnUNet team,

I have some questions regarding the experimental setup and reported results in the `MedNeXt-v1` [1], `MedNeXt-v2` [2] and `nnU-Net Revisited`[3] papers. I have been trying to carefully reproduce the experiments using the official nnU-Net framework, MedNeXt repository, and settings.

1) I would like to ask about GPU VRAM usage reported in Table 1 of the `nnU-Net Revisited` paper [3]. I reproduced the experiments using the official code, same presets and patch size, checkpointing and AMP enabled, and on an A100 40 GB GPU. The runtime scaling between `ResEnc L` and `MedNeXt L k3` is very similar to what is reported in the paper, so I believe my setup is aligned with the benchmark. However, the allocated memory for `MedNeXt L k3` is noticeably higher in my experiments, and in my case MedNeXt uses more memory than `ResEnc L`, while in Table 1 [3] it is reported to use less. `MedNeXt L k5` also utilises similar GPU VRAM as `MedNeXt L k3`, but of course, takes longer to run. 

For ResEnc UNet L on A100 40 GB, my results match closely:
```
Training time ≈ 28 hours (paper: 35 hours)
Allocated CUDA memory ≈ 21.5 GB (paper: 22.7 GB)
Reserved CUDA memory ≈ 22.6 GB
Nvidia-smi ≈ 23.5 GB
```
However, for MedNeXt L k3 on A100 40 GB, I observe:
```
Training time ≈ 60 hours (paper: 68 hours)
Allocated CUDA memory ≈ 24.9 GB (paper: 17.3 GB)
Reserved CUDA memory ≈ 33.3 GB
Nvidia-smi ≈ 34.6 GB
```

I even trianed the MedNeXt models with a RTX 5090 (32 GB) GPU, and even then the VRAM usage did not go below 20 GB.

2) Could you please clarify:
a) Whether any additional settings were used for MedNeXt [3] in the benchmark?
b) How exactly VRAM was measured for Table 1 (allocated vs reserved memory or Nvidia-smi)[3]?
c) Whether a slightly different MedNeXt configuration or commit version was used internally?
d) I also wanted to confirm whether the reported runtime in Table 1 [3] is measured per fold (average), right? Not aggregated over all 5 cross-validation folds.

3) Finally, I noticed that the DSC values of nnUNet, ResEnc UNet L and MedNeXt-v1 trained from scratch differ slightly between the MedNeXt-v1 [1] and MedNeXt-v2 [2] papers (for example on AMOS, KiTS and ACDC). I understand that different splits or protocols may have been used, but I would appreciate clarification on this as well.

Thank you very much for your time and help.

References:
[1] MedNeXt v1 - https://conferences.miccai.org/2023/papers/410-Paper1656.html
[2] MedNeXt v2 - https://arxiv.org/pdf/2512.17774
[3] nnU-Net Revisited - https://papers.miccai.org/miccai-2024/paper/2847_paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MedNeXt (versions 1 and 2) experiments, VRAM and runtime reporting #2996

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MedNeXt (versions 1 and 2) experiments, VRAM and runtime reporting #2996

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions