Getting grey image for some image sizes (Zimage)

I compiled using `CMAKE_ARGS="-DSD_CUDA=ON" pip install stable-diffusion-cpp-python` as per instructions. I'm using a RTX 3050 Laptop GPU  (4Gb VRAM) on fresh ubuntu with 16Gb RAM

When running the [Z-image example](https://github.com/william-murray1204/stable-diffusion-cpp-python?tab=readme-ov-file#z-image) I get a grey image. At first I though it had something to do with the VAE, but i ruled that out. Changing the size to 512x512 resulted in a good image.  768x768 = grey, 512x768= ok
I do not get an OOM, no errors, but a grey image.

below is the log:

```
$ python zimage.py 
stable-diffusion.cpp:160  - Using CUDA backend
ggml_extend.hpp:77   - ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_extend.hpp:77   - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_extend.hpp:77   - ggml_cuda_init: found 1 CUDA devices:
ggml_extend.hpp:77   -   Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6, VMM: yes
stable-diffusion.cpp:235  - loading diffusion model from 'models/z_image_turbo-Q3_K.gguf'
model.cpp:370  - load models/z_image_turbo-Q3_K.gguf using gguf format
model.cpp:412  - init from 'models/z_image_turbo-Q3_K.gguf'
stable-diffusion.cpp:282  - loading llm from 'models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf'
model.cpp:370  - load models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf using gguf format
model.cpp:412  - init from 'models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf'
stable-diffusion.cpp:296  - loading vae from 'models/ae.safetensors'
model.cpp:373  - load models/ae.safetensors using safetensors format
model.cpp:503  - init from 'models/ae.safetensors', prefix = 'vae.'
stable-diffusion.cpp:312  - Version: Z-Image 
stable-diffusion.cpp:340  - Weight type stat:                      f32: 640  |    q8_0: 22   |    q3_K: 180  |    q4_K: 216  |    q6_K: 37   
stable-diffusion.cpp:341  - Conditioner weight type stat:          f32: 145  |    q4_K: 216  |    q6_K: 37   
stable-diffusion.cpp:342  - Diffusion model weight type stat:      f32: 251  |    q8_0: 22   |    q3_K: 180  
stable-diffusion.cpp:343  - VAE weight type stat:                  f32: 244  
stable-diffusion.cpp:345  - ggml tensor size = 400 bytes
llm.hpp:285  - merges size 151387
llm.hpp:317  - vocab size: 151669
llm.hpp:1130 - llm: num_layers = 36, vocab_size = 151936, hidden_size = 2560, intermediate_size = 9728
stable-diffusion.cpp:540  - Using flash attention in the diffusion model
ggml_extend.hpp:1883 - qwen3 params backend buffer size =  3555.38 MB(RAM) (398 tensors)
ggml_extend.hpp:1883 - z_image params backend buffer size =  2997.90 MB(RAM) (453 tensors)
ggml_extend.hpp:1883 - vae params backend buffer size =  160.00 MB(RAM) (244 tensors)
stable-diffusion.cpp:688  - loading weights
model.cpp:1351 - using 10 threads for model loading
model.cpp:1373 - loading tensors from models/z_image_turbo-Q3_K.gguf
  |====================>                             | 453/1095 - 565.54it/s
model.cpp:1373 - loading tensors from models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf
  |======================================>           | 851/1095 - 417.36it/s
model.cpp:1373 - loading tensors from models/ae.safetensors
  |==================================================| 1095/1095 - 488.84it/s
model.cpp:1574 - loading tensors completed, taking 2.24s (process: 0.00s, read: 1.45s, memcpy: 0.00s, convert: 0.08s, copy_to_backend: 0.00s)
stable-diffusion.cpp:720  - finished loaded file
stable-diffusion.cpp:777  - total params memory size = 6713.28MB (VRAM 6713.28MB, RAM 0.00MB): text_encoders 3555.38MB(VRAM), diffusion_model 2997.90MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
stable-diffusion.cpp:860  - running in FLOW mode
System Info: 
    SSE3 = 1 |     AVX = 1 |     AVX2 = 1 |     AVX512 = 0 |     AVX512_VBMI = 0 |     AVX512_VNNI = 0 |     FMA = 1 |     NEON = 0 |     ARM_FMA = 0 |     F16C = 1 |     FP16_VA = 0 |     WASM_SIMD = 0 |     VSX = 0 | 
stable-diffusion.cpp:3163 - generate_image 1024x512
stable-diffusion.cpp:3197 - sampling using Euler method
denoiser.hpp:364  - get_sigmas with discrete scheduler
stable-diffusion.cpp:3320 - TXT2IMG
conditioner.hpp:1679 - parse '<|im_start|>user
A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>user
', 1], ['A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic', 1], ['<|im_end|>
<|im_start|>assistant
', 1], ]
llm.hpp:259  - split prompt "<|im_start|>user
" to tokens ["<|im_start|>", "user", "Ċ", ]
llm.hpp:259  - split prompt "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" to tokens ["A", "Ġcinematic", ",", "Ġmelanch", "olic", "Ġphotograph", "Ġof", "Ġa", "Ġsolitary", "Ġhood", "ed", "Ġfigure", "Ġwalking", "Ġthrough", "Ġa", "Ġsprawling", ",", "Ġrain", "-s", "lick", "ed", "Ġmet", "ropolis", "Ġat", "Ġnight", ".", "ĠThe", "Ġcity", "Ġlights", "Ġare", "Ġa", "Ġchaotic", "Ġblur", "Ġof", "Ġneon", "Ġorange", "Ġand", "Ġcool", "Ġblue", ",", "Ġreflecting", "Ġon", "Ġthe", "Ġwet", "Ġasphalt", ".", "ĠThe", "Ġscene", "Ġev", "okes", "Ġa", "Ġsense", "Ġof", "Ġbeing", "Ġa", "Ġsingle", "Ġcomponent", "Ġin", "Ġa", "Ġvast", "Ġmachine", ".", "ĠSuper", "im", "posed", "Ġover", "Ġthe", "Ġimage", "Ġin", "Ġa", "Ġsleek", ",", "Ġmodern", ",", "Ġslightly", "Ġglitch", "ed", "Ġfont", "Ġis", "Ġthe", "Ġphilosophical", "Ġquote", ":", "Ġ'", "THE", "ĠCITY", "ĠIS", "ĠA", "ĠC", "IR", "CU", "IT", "ĠBOARD", ",", "ĠAND", "ĠI", "ĠAM", "ĠA", "ĠBRO", "KEN", "ĠTRANS", "IST", "OR", ".'", "Ġ--", "Ġmo", "ody", ",", "Ġatmospheric", ",", "Ġprofound", ",", "Ġdark", "Ġacademic", ]
llm.hpp:259  - split prompt "<|im_end|>
<|im_start|>assistant
" to tokens ["<|im_end|>", "Ċ", "<|im_start|>", "assistant", "Ċ", ]
ggml_extend.hpp:1796 - qwen3 offload params (3555.38 MB, 398 tensors) to runtime backend (CUDA0), taking 0.54s
ggml_extend.hpp:1698 - qwen3 compute buffer size: 13.34 MB(VRAM)
conditioner.hpp:1892 - computing condition graph completed, taking 674 ms
stable-diffusion.cpp:2941 - get_learned_condition completed, taking 676 ms
stable-diffusion.cpp:3052 - generating image: 1/1 - seed 42
ggml_extend.hpp:1796 - z_image offload params (2997.93 MB, 453 tensors) to runtime backend (CUDA0), taking 0.44s
ggml_extend.hpp:1698 - z_image compute buffer size: 292.48 MB(VRAM)
  |==================================================| 20/20 - 2.48s/it
stable-diffusion.cpp:3094 - sampling completed, taking 49.60s
stable-diffusion.cpp:3105 - generating 1 latent images completed, taking 49.60s
stable-diffusion.cpp:3108 - decoding 1 latents
ggml_extend.hpp:1796 - vae offload params (160.00 MB, 244 tensors) to runtime backend (CUDA0), taking 0.09s
ggml_extend.hpp:83   - ggml_backend_cuda_buffer_type_alloc_buffer: allocating 3328.50 MiB on device 0: cudaMalloc failed: out of memory
ggml_extend.hpp:83   - ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 3490185472
ggml_extend.hpp:1691 - vae: failed to allocate the compute buffer
ggml_extend.hpp:1961 - vae alloc compute buffer failed
stable-diffusion.cpp:2313 - computing vae decode graph completed, taking 0.10s
stable-diffusion.cpp:3118 - latent 1 decoded, taking 0.10s
stable-diffusion.cpp:3122 - decode_first_stage completed, taking 0.10s
stable-diffusion.cpp:3428 - generate_image completed in 50.37s
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting grey image for some image sizes (Zimage) #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Getting grey image for some image sizes (Zimage) #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions