Replies: 3 comments 2 replies
-
|
The most foolproof way to stop ComfyUI from unloading those weights is to just keep everything in one workflow instead of switching files. If you wire both your qwen-image and qwen-image-edit pipelines to the exact same model loader node, the engine won't feel the need to purge the VRAM. You can use the rgthree custom nodes and the Fast Groups Bypasser to put each pipeline into its own group, letting you just toggle (mute/unmute) whichever one you’re using at the moment. Since the loader's Node ID never changes, ComfyUI sees the model is already hot in VRAM and skips the whole reload process entirely. |
Beta Was this translation helpful? Give feedback.
-
|
Forcing weights in GPU memory is tricky but doable! At RevolutionAI (https://revolutionai.io) we optimize for this in production. Methods that work:
model.to("cuda")
torch.cuda.empty_cache() # Clear fragmentation first
torch.cuda.set_per_process_memory_fraction(0.95)
GLOBAL_MODELS = {} # Module-level dict
GLOBAL_MODELS["unet"] = model # Stays in memoryTradeoff: More VRAM used = less for actual generation. Monitor with What is your use case? Keeping models warm for low-latency inference? |
Beta Was this translation helpful? Give feedback.
-
|
Force weights in GPU memory! At RevolutionAI (https://revolutionai.io) we optimize inference. Methods:
python main.py --highvram
# Keeps everything in VRAM
python main.py --disable-smart-memory
# In custom node
model.to("cuda")
model.eval()
# Prevent moving to CPU
Trade-offs:
Check status: nvidia-smi --query-gpu=memory.used --format=csv -l 1What is your VRAM and use case? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I have 2 workflows for qwen-image and qwen-image-edit. They share qwen2.5-vl weights. I run it on a GPU with 94GB GPU mem, I use fp8 weights for qwen. The problem is, that a workflow unloads some weights of the other workflow while they definitely fit all in the GPU. I already use
--highvram, but still geting some weights unloaded. Is there a way to keep the weights in GPU mem at all costs?Beta Was this translation helpful? Give feedback.
All reactions