Skip to content

Getting grey image for some image sizes (Zimage) #18

@kaosbeat

Description

@kaosbeat

I compiled using CMAKE_ARGS="-DSD_CUDA=ON" pip install stable-diffusion-cpp-python as per instructions. I'm using a RTX 3050 Laptop GPU (4Gb VRAM) on fresh ubuntu with 16Gb RAM

When running the Z-image example I get a grey image. At first I though it had something to do with the VAE, but i ruled that out. Changing the size to 512x512 resulted in a good image. 768x768 = grey, 512x768= ok
I do not get an OOM, no errors, but a grey image.

below is the log:

$ python zimage.py 
stable-diffusion.cpp:160  - Using CUDA backend
ggml_extend.hpp:77   - ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_extend.hpp:77   - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_extend.hpp:77   - ggml_cuda_init: found 1 CUDA devices:
ggml_extend.hpp:77   -   Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6, VMM: yes
stable-diffusion.cpp:235  - loading diffusion model from 'models/z_image_turbo-Q3_K.gguf'
model.cpp:370  - load models/z_image_turbo-Q3_K.gguf using gguf format
model.cpp:412  - init from 'models/z_image_turbo-Q3_K.gguf'
stable-diffusion.cpp:282  - loading llm from 'models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf'
model.cpp:370  - load models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf using gguf format
model.cpp:412  - init from 'models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf'
stable-diffusion.cpp:296  - loading vae from 'models/ae.safetensors'
model.cpp:373  - load models/ae.safetensors using safetensors format
model.cpp:503  - init from 'models/ae.safetensors', prefix = 'vae.'
stable-diffusion.cpp:312  - Version: Z-Image 
stable-diffusion.cpp:340  - Weight type stat:                      f32: 640  |    q8_0: 22   |    q3_K: 180  |    q4_K: 216  |    q6_K: 37   
stable-diffusion.cpp:341  - Conditioner weight type stat:          f32: 145  |    q4_K: 216  |    q6_K: 37   
stable-diffusion.cpp:342  - Diffusion model weight type stat:      f32: 251  |    q8_0: 22   |    q3_K: 180  
stable-diffusion.cpp:343  - VAE weight type stat:                  f32: 244  
stable-diffusion.cpp:345  - ggml tensor size = 400 bytes
llm.hpp:285  - merges size 151387
llm.hpp:317  - vocab size: 151669
llm.hpp:1130 - llm: num_layers = 36, vocab_size = 151936, hidden_size = 2560, intermediate_size = 9728
stable-diffusion.cpp:540  - Using flash attention in the diffusion model
ggml_extend.hpp:1883 - qwen3 params backend buffer size =  3555.38 MB(RAM) (398 tensors)
ggml_extend.hpp:1883 - z_image params backend buffer size =  2997.90 MB(RAM) (453 tensors)
ggml_extend.hpp:1883 - vae params backend buffer size =  160.00 MB(RAM) (244 tensors)
stable-diffusion.cpp:688  - loading weights
model.cpp:1351 - using 10 threads for model loading
model.cpp:1373 - loading tensors from models/z_image_turbo-Q3_K.gguf
  |====================>                             | 453/1095 - 565.54it/s
model.cpp:1373 - loading tensors from models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf
  |======================================>           | 851/1095 - 417.36it/s
model.cpp:1373 - loading tensors from models/ae.safetensors
  |==================================================| 1095/1095 - 488.84it/s
model.cpp:1574 - loading tensors completed, taking 2.24s (process: 0.00s, read: 1.45s, memcpy: 0.00s, convert: 0.08s, copy_to_backend: 0.00s)
stable-diffusion.cpp:720  - finished loaded file
stable-diffusion.cpp:777  - total params memory size = 6713.28MB (VRAM 6713.28MB, RAM 0.00MB): text_encoders 3555.38MB(VRAM), diffusion_model 2997.90MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
stable-diffusion.cpp:860  - running in FLOW mode
System Info: 
    SSE3 = 1 |     AVX = 1 |     AVX2 = 1 |     AVX512 = 0 |     AVX512_VBMI = 0 |     AVX512_VNNI = 0 |     FMA = 1 |     NEON = 0 |     ARM_FMA = 0 |     F16C = 1 |     FP16_VA = 0 |     WASM_SIMD = 0 |     VSX = 0 | 
stable-diffusion.cpp:3163 - generate_image 1024x512
stable-diffusion.cpp:3197 - sampling using Euler method
denoiser.hpp:364  - get_sigmas with discrete scheduler
stable-diffusion.cpp:3320 - TXT2IMG
conditioner.hpp:1679 - parse '<|im_start|>user
A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>user
', 1], ['A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic', 1], ['<|im_end|>
<|im_start|>assistant
', 1], ]
llm.hpp:259  - split prompt "<|im_start|>user
" to tokens ["<|im_start|>", "user", "Ċ", ]
llm.hpp:259  - split prompt "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" to tokens ["A", "Ġcinematic", ",", "Ġmelanch", "olic", "Ġphotograph", "Ġof", "Ġa", "Ġsolitary", "Ġhood", "ed", "Ġfigure", "Ġwalking", "Ġthrough", "Ġa", "Ġsprawling", ",", "Ġrain", "-s", "lick", "ed", "Ġmet", "ropolis", "Ġat", "Ġnight", ".", "ĠThe", "Ġcity", "Ġlights", "Ġare", "Ġa", "Ġchaotic", "Ġblur", "Ġof", "Ġneon", "Ġorange", "Ġand", "Ġcool", "Ġblue", ",", "Ġreflecting", "Ġon", "Ġthe", "Ġwet", "Ġasphalt", ".", "ĠThe", "Ġscene", "Ġev", "okes", "Ġa", "Ġsense", "Ġof", "Ġbeing", "Ġa", "Ġsingle", "Ġcomponent", "Ġin", "Ġa", "Ġvast", "Ġmachine", ".", "ĠSuper", "im", "posed", "Ġover", "Ġthe", "Ġimage", "Ġin", "Ġa", "Ġsleek", ",", "Ġmodern", ",", "Ġslightly", "Ġglitch", "ed", "Ġfont", "Ġis", "Ġthe", "Ġphilosophical", "Ġquote", ":", "Ġ'", "THE", "ĠCITY", "ĠIS", "ĠA", "ĠC", "IR", "CU", "IT", "ĠBOARD", ",", "ĠAND", "ĠI", "ĠAM", "ĠA", "ĠBRO", "KEN", "ĠTRANS", "IST", "OR", ".'", "Ġ--", "Ġmo", "ody", ",", "Ġatmospheric", ",", "Ġprofound", ",", "Ġdark", "Ġacademic", ]
llm.hpp:259  - split prompt "<|im_end|>
<|im_start|>assistant
" to tokens ["<|im_end|>", "Ċ", "<|im_start|>", "assistant", "Ċ", ]
ggml_extend.hpp:1796 - qwen3 offload params (3555.38 MB, 398 tensors) to runtime backend (CUDA0), taking 0.54s
ggml_extend.hpp:1698 - qwen3 compute buffer size: 13.34 MB(VRAM)
conditioner.hpp:1892 - computing condition graph completed, taking 674 ms
stable-diffusion.cpp:2941 - get_learned_condition completed, taking 676 ms
stable-diffusion.cpp:3052 - generating image: 1/1 - seed 42
ggml_extend.hpp:1796 - z_image offload params (2997.93 MB, 453 tensors) to runtime backend (CUDA0), taking 0.44s
ggml_extend.hpp:1698 - z_image compute buffer size: 292.48 MB(VRAM)
  |==================================================| 20/20 - 2.48s/it
stable-diffusion.cpp:3094 - sampling completed, taking 49.60s
stable-diffusion.cpp:3105 - generating 1 latent images completed, taking 49.60s
stable-diffusion.cpp:3108 - decoding 1 latents
ggml_extend.hpp:1796 - vae offload params (160.00 MB, 244 tensors) to runtime backend (CUDA0), taking 0.09s
ggml_extend.hpp:83   - ggml_backend_cuda_buffer_type_alloc_buffer: allocating 3328.50 MiB on device 0: cudaMalloc failed: out of memory
ggml_extend.hpp:83   - ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 3490185472
ggml_extend.hpp:1691 - vae: failed to allocate the compute buffer
ggml_extend.hpp:1961 - vae alloc compute buffer failed
stable-diffusion.cpp:2313 - computing vae decode graph completed, taking 0.10s
stable-diffusion.cpp:3118 - latent 1 decoded, taking 0.10s
stable-diffusion.cpp:3122 - decode_first_stage completed, taking 0.10s
stable-diffusion.cpp:3428 - generate_image completed in 50.37s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions