-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
I compiled using CMAKE_ARGS="-DSD_CUDA=ON" pip install stable-diffusion-cpp-python as per instructions. I'm using a RTX 3050 Laptop GPU (4Gb VRAM) on fresh ubuntu with 16Gb RAM
When running the Z-image example I get a grey image. At first I though it had something to do with the VAE, but i ruled that out. Changing the size to 512x512 resulted in a good image. 768x768 = grey, 512x768= ok
I do not get an OOM, no errors, but a grey image.
below is the log:
$ python zimage.py
stable-diffusion.cpp:160 - Using CUDA backend
ggml_extend.hpp:77 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_extend.hpp:77 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_extend.hpp:77 - ggml_cuda_init: found 1 CUDA devices:
ggml_extend.hpp:77 - Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6, VMM: yes
stable-diffusion.cpp:235 - loading diffusion model from 'models/z_image_turbo-Q3_K.gguf'
model.cpp:370 - load models/z_image_turbo-Q3_K.gguf using gguf format
model.cpp:412 - init from 'models/z_image_turbo-Q3_K.gguf'
stable-diffusion.cpp:282 - loading llm from 'models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf'
model.cpp:370 - load models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf using gguf format
model.cpp:412 - init from 'models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf'
stable-diffusion.cpp:296 - loading vae from 'models/ae.safetensors'
model.cpp:373 - load models/ae.safetensors using safetensors format
model.cpp:503 - init from 'models/ae.safetensors', prefix = 'vae.'
stable-diffusion.cpp:312 - Version: Z-Image
stable-diffusion.cpp:340 - Weight type stat: f32: 640 | q8_0: 22 | q3_K: 180 | q4_K: 216 | q6_K: 37
stable-diffusion.cpp:341 - Conditioner weight type stat: f32: 145 | q4_K: 216 | q6_K: 37
stable-diffusion.cpp:342 - Diffusion model weight type stat: f32: 251 | q8_0: 22 | q3_K: 180
stable-diffusion.cpp:343 - VAE weight type stat: f32: 244
stable-diffusion.cpp:345 - ggml tensor size = 400 bytes
llm.hpp:285 - merges size 151387
llm.hpp:317 - vocab size: 151669
llm.hpp:1130 - llm: num_layers = 36, vocab_size = 151936, hidden_size = 2560, intermediate_size = 9728
stable-diffusion.cpp:540 - Using flash attention in the diffusion model
ggml_extend.hpp:1883 - qwen3 params backend buffer size = 3555.38 MB(RAM) (398 tensors)
ggml_extend.hpp:1883 - z_image params backend buffer size = 2997.90 MB(RAM) (453 tensors)
ggml_extend.hpp:1883 - vae params backend buffer size = 160.00 MB(RAM) (244 tensors)
stable-diffusion.cpp:688 - loading weights
model.cpp:1351 - using 10 threads for model loading
model.cpp:1373 - loading tensors from models/z_image_turbo-Q3_K.gguf
|====================> | 453/1095 - 565.54it/s
model.cpp:1373 - loading tensors from models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf
|======================================> | 851/1095 - 417.36it/s
model.cpp:1373 - loading tensors from models/ae.safetensors
|==================================================| 1095/1095 - 488.84it/s
model.cpp:1574 - loading tensors completed, taking 2.24s (process: 0.00s, read: 1.45s, memcpy: 0.00s, convert: 0.08s, copy_to_backend: 0.00s)
stable-diffusion.cpp:720 - finished loaded file
stable-diffusion.cpp:777 - total params memory size = 6713.28MB (VRAM 6713.28MB, RAM 0.00MB): text_encoders 3555.38MB(VRAM), diffusion_model 2997.90MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
stable-diffusion.cpp:860 - running in FLOW mode
System Info:
SSE3 = 1 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | VSX = 0 |
stable-diffusion.cpp:3163 - generate_image 1024x512
stable-diffusion.cpp:3197 - sampling using Euler method
denoiser.hpp:364 - get_sigmas with discrete scheduler
stable-diffusion.cpp:3320 - TXT2IMG
conditioner.hpp:1679 - parse '<|im_start|>user
A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>user
', 1], ['A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic', 1], ['<|im_end|>
<|im_start|>assistant
', 1], ]
llm.hpp:259 - split prompt "<|im_start|>user
" to tokens ["<|im_start|>", "user", "Ċ", ]
llm.hpp:259 - split prompt "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" to tokens ["A", "Ġcinematic", ",", "Ġmelanch", "olic", "Ġphotograph", "Ġof", "Ġa", "Ġsolitary", "Ġhood", "ed", "Ġfigure", "Ġwalking", "Ġthrough", "Ġa", "Ġsprawling", ",", "Ġrain", "-s", "lick", "ed", "Ġmet", "ropolis", "Ġat", "Ġnight", ".", "ĠThe", "Ġcity", "Ġlights", "Ġare", "Ġa", "Ġchaotic", "Ġblur", "Ġof", "Ġneon", "Ġorange", "Ġand", "Ġcool", "Ġblue", ",", "Ġreflecting", "Ġon", "Ġthe", "Ġwet", "Ġasphalt", ".", "ĠThe", "Ġscene", "Ġev", "okes", "Ġa", "Ġsense", "Ġof", "Ġbeing", "Ġa", "Ġsingle", "Ġcomponent", "Ġin", "Ġa", "Ġvast", "Ġmachine", ".", "ĠSuper", "im", "posed", "Ġover", "Ġthe", "Ġimage", "Ġin", "Ġa", "Ġsleek", ",", "Ġmodern", ",", "Ġslightly", "Ġglitch", "ed", "Ġfont", "Ġis", "Ġthe", "Ġphilosophical", "Ġquote", ":", "Ġ'", "THE", "ĠCITY", "ĠIS", "ĠA", "ĠC", "IR", "CU", "IT", "ĠBOARD", ",", "ĠAND", "ĠI", "ĠAM", "ĠA", "ĠBRO", "KEN", "ĠTRANS", "IST", "OR", ".'", "Ġ--", "Ġmo", "ody", ",", "Ġatmospheric", ",", "Ġprofound", ",", "Ġdark", "Ġacademic", ]
llm.hpp:259 - split prompt "<|im_end|>
<|im_start|>assistant
" to tokens ["<|im_end|>", "Ċ", "<|im_start|>", "assistant", "Ċ", ]
ggml_extend.hpp:1796 - qwen3 offload params (3555.38 MB, 398 tensors) to runtime backend (CUDA0), taking 0.54s
ggml_extend.hpp:1698 - qwen3 compute buffer size: 13.34 MB(VRAM)
conditioner.hpp:1892 - computing condition graph completed, taking 674 ms
stable-diffusion.cpp:2941 - get_learned_condition completed, taking 676 ms
stable-diffusion.cpp:3052 - generating image: 1/1 - seed 42
ggml_extend.hpp:1796 - z_image offload params (2997.93 MB, 453 tensors) to runtime backend (CUDA0), taking 0.44s
ggml_extend.hpp:1698 - z_image compute buffer size: 292.48 MB(VRAM)
|==================================================| 20/20 - 2.48s/it
stable-diffusion.cpp:3094 - sampling completed, taking 49.60s
stable-diffusion.cpp:3105 - generating 1 latent images completed, taking 49.60s
stable-diffusion.cpp:3108 - decoding 1 latents
ggml_extend.hpp:1796 - vae offload params (160.00 MB, 244 tensors) to runtime backend (CUDA0), taking 0.09s
ggml_extend.hpp:83 - ggml_backend_cuda_buffer_type_alloc_buffer: allocating 3328.50 MiB on device 0: cudaMalloc failed: out of memory
ggml_extend.hpp:83 - ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 3490185472
ggml_extend.hpp:1691 - vae: failed to allocate the compute buffer
ggml_extend.hpp:1961 - vae alloc compute buffer failed
stable-diffusion.cpp:2313 - computing vae decode graph completed, taking 0.10s
stable-diffusion.cpp:3118 - latent 1 decoded, taking 0.10s
stable-diffusion.cpp:3122 - decode_first_stage completed, taking 0.10s
stable-diffusion.cpp:3428 - generate_image completed in 50.37s
Metadata
Metadata
Assignees
Labels
No labels