-
Notifications
You must be signed in to change notification settings - Fork 196
Description
π§ Describe the Issue
When running MusePose on Windows (GPU with 4 GB VRAM), the test_stage_2.py script completes successfully β no errors or crashes β but the generated output video is completely black.
The script prints normal progress bars, produces MP4 files in results/β¦, and logs look clean. However, every frame in the resulting video is just a black screen.
I am using the assets from the repository itself.
I already tried:
- Verified pose_align.py output (align/img_ref_video_dance.mp4) is valid and shows correct pose sequence.
- Set smaller parameters for low VRAM (-W 160 -H 160 -S 3 -O 1 --steps 12 --cfg 1.2 --skip 7 -L 30).
- Patched musepose/utils/util.py with torch.nan_to_num(...).clamp(0,1) to handle NaNs.
- Enabled rescale=True in all save_videos_grid() calls inside test_stage_2.py.
- Checked that tensors are within [-1, 1] before saving.
- Confirmed no runtime warnings or CUDA errors remain.
Even with these, output videos remain black.
βοΈ Environment
OS | Windows 11
Python | 3.10.x
PyTorch | 2.1.2 + cu121
TorchVision | 0.16.2 + cu121
MMCV | 2.1.0
MMDet | 3.2.0
MMPose | 1.3.1
GPU | NVIDIA (4 GB VRAM)
python test_stage_2.py --config ./configs/test_stage_2.yaml -W 160 -H 160 -S 3 -O 1 --steps 12 --cfg 1.2 --skip 7 -L 30
π Files Involved
- pose_align.py (runs correctly, produces valid aligned pose video)
- test_stage_2.py (runs without crash but outputs black)
- musepose/utils/util.py (patched to sanitize NaNs)
- configs/test_stage_2.yaml (uses weight_dtype: fp16)
Width: 160 Height: 160 Length: 300 Slice: 3 Overlap: 1 Classifier free guidance: 1.2 DDIM sampling steps : 12 skip 7 pose video has 60 frames, with 30 fps processing length: 8 fps 3 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 12/12 [00:52<00:00, 4.33s/it] 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:06<00:00, 2.28s/it] (no errors; output MP4 saved but appears black)
β Questions for Maintainers
- Is there a post-processing normalization step missing in save_videos_grid()?
- Could the UNet/VAE weights or fp16 precision cause NaNs that collapse after clamping?
- Are there known issues running test_stage_2.py with low-resolution or short sequences on small GPUs?
- Could the diffusion model require a minimum resolution (e.g. 512 Γ 512) to generate meaningful output?
The output i get.