Skip to content

MedNeXt (versions 1 and 2) experiments, VRAM and runtime reporting #2996

@Shrajan

Description

@Shrajan

Hi nnUNet team,

I have some questions regarding the experimental setup and reported results in the MedNeXt-v1 [1], MedNeXt-v2 [2] and nnU-Net Revisited[3] papers. I have been trying to carefully reproduce the experiments using the official nnU-Net framework, MedNeXt repository, and settings.

  1. I would like to ask about GPU VRAM usage reported in Table 1 of the nnU-Net Revisited paper [3]. I reproduced the experiments using the official code, same presets and patch size, checkpointing and AMP enabled, and on an A100 40 GB GPU. The runtime scaling between ResEnc L and MedNeXt L k3 is very similar to what is reported in the paper, so I believe my setup is aligned with the benchmark. However, the allocated memory for MedNeXt L k3 is noticeably higher in my experiments, and in my case MedNeXt uses more memory than ResEnc L, while in Table 1 [3] it is reported to use less. MedNeXt L k5 also utilises similar GPU VRAM as MedNeXt L k3, but of course, takes longer to run.

For ResEnc UNet L on A100 40 GB, my results match closely:

Training time ≈ 28 hours (paper: 35 hours)
Allocated CUDA memory ≈ 21.5 GB (paper: 22.7 GB)
Reserved CUDA memory ≈ 22.6 GB
Nvidia-smi ≈ 23.5 GB

However, for MedNeXt L k3 on A100 40 GB, I observe:

Training time ≈ 60 hours (paper: 68 hours)
Allocated CUDA memory ≈ 24.9 GB (paper: 17.3 GB)
Reserved CUDA memory ≈ 33.3 GB
Nvidia-smi ≈ 34.6 GB

I even trianed the MedNeXt models with a RTX 5090 (32 GB) GPU, and even then the VRAM usage did not go below 20 GB.

  1. Could you please clarify:
    a) Whether any additional settings were used for MedNeXt [3] in the benchmark?
    b) How exactly VRAM was measured for Table 1 (allocated vs reserved memory or Nvidia-smi)[3]?
    c) Whether a slightly different MedNeXt configuration or commit version was used internally?
    d) I also wanted to confirm whether the reported runtime in Table 1 [3] is measured per fold (average), right? Not aggregated over all 5 cross-validation folds.

  2. Finally, I noticed that the DSC values of nnUNet, ResEnc UNet L and MedNeXt-v1 trained from scratch differ slightly between the MedNeXt-v1 [1] and MedNeXt-v2 [2] papers (for example on AMOS, KiTS and ACDC). I understand that different splits or protocols may have been used, but I would appreciate clarification on this as well.

Thank you very much for your time and help.

References:
[1] MedNeXt v1 - https://conferences.miccai.org/2023/papers/410-Paper1656.html
[2] MedNeXt v2 - https://arxiv.org/pdf/2512.17774
[3] nnU-Net Revisited - https://papers.miccai.org/miccai-2024/paper/2847_paper.pdf

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions