Do you have a recomended ratio to adjust the latents from your trained encoder models, like people usually multiply 0.18215 for latents from VAE of SDXL?
Typically I encountered an issue that when I use your trained encoder to encode images and then use the lantents for my own var-like model, the training results appear to settle inside a significantly small gap, 0.3-0.5 for example, which make the outcome image generally too bright.
I'm currently using your vqmodel trained on IMed, loading weights under the same setting, using images of colored fundus photography with no typical normalization( which could be part of the issue, but I checked the scales produced by the model and didn't find similiar brightness problem).
I really appreciate your help, and admire your work.
Do you have a recomended ratio to adjust the latents from your trained encoder models, like people usually multiply 0.18215 for latents from VAE of SDXL?
Typically I encountered an issue that when I use your trained encoder to encode images and then use the lantents for my own var-like model, the training results appear to settle inside a significantly small gap, 0.3-0.5 for example, which make the outcome image generally too bright.
I'm currently using your vqmodel trained on IMed, loading weights under the same setting, using images of colored fundus photography with no typical normalization( which could be part of the issue, but I checked the scales produced by the model and didn't find similiar brightness problem).
I really appreciate your help, and admire your work.