Skip to content

Black / Blank Output Video in test_stage_2.pyΒ #92

@smilekison

Description

@smilekison

🧠 Describe the Issue

When running MusePose on Windows (GPU with 4 GB VRAM), the test_stage_2.py script completes successfully β€” no errors or crashes β€” but the generated output video is completely black.

The script prints normal progress bars, produces MP4 files in results/…, and logs look clean. However, every frame in the resulting video is just a black screen.

I am using the assets from the repository itself.

I already tried:

  • Verified pose_align.py output (align/img_ref_video_dance.mp4) is valid and shows correct pose sequence.
  • Set smaller parameters for low VRAM (-W 160 -H 160 -S 3 -O 1 --steps 12 --cfg 1.2 --skip 7 -L 30).
  • Patched musepose/utils/util.py with torch.nan_to_num(...).clamp(0,1) to handle NaNs.
  • Enabled rescale=True in all save_videos_grid() calls inside test_stage_2.py.
  • Checked that tensors are within [-1, 1] before saving.
  • Confirmed no runtime warnings or CUDA errors remain.

Even with these, output videos remain black.

βš™οΈ Environment
OS | Windows 11
Python | 3.10.x
PyTorch | 2.1.2 + cu121
TorchVision | 0.16.2 + cu121
MMCV | 2.1.0
MMDet | 3.2.0
MMPose | 1.3.1
GPU | NVIDIA (4 GB VRAM)
python test_stage_2.py --config ./configs/test_stage_2.yaml -W 160 -H 160 -S 3 -O 1 --steps 12 --cfg 1.2 --skip 7 -L 30

πŸ“ Files Involved

  • pose_align.py (runs correctly, produces valid aligned pose video)
  • test_stage_2.py (runs without crash but outputs black)
  • musepose/utils/util.py (patched to sanitize NaNs)
  • configs/test_stage_2.yaml (uses weight_dtype: fp16)

Width: 160 Height: 160 Length: 300 Slice: 3 Overlap: 1 Classifier free guidance: 1.2 DDIM sampling steps : 12 skip 7 pose video has 60 frames, with 30 fps processing length: 8 fps 3 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12/12 [00:52<00:00, 4.33s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:06<00:00, 2.28s/it] (no errors; output MP4 saved but appears black)

❓ Questions for Maintainers

  1. Is there a post-processing normalization step missing in save_videos_grid()?
  2. Could the UNet/VAE weights or fp16 precision cause NaNs that collapse after clamping?
  3. Are there known issues running test_stage_2.py with low-resolution or short sequences on small GPUs?
  4. Could the diffusion model require a minimum resolution (e.g. 512 Γ— 512) to generate meaningful output?

The output i get.

ref_img_ref_video_dance_1.2_12_7.mp4
ref_img_ref_video_dance_1.2_12_7__.mp4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions