It seems the released pretrained model has different model structure to code on this lab;
For example, there is actually no "model.vision_model" on retrieval BLIP, so there is no way to use the blip-itm-base-flickr , which only contains this attribute but no "visual_encoder"
Please update related code or give us detailed instruction on using it.
Thank you very much for your time and assistance.
It seems the released pretrained model has different model structure to code on this lab;
For example, there is actually no "model.vision_model" on retrieval BLIP, so there is no way to use the blip-itm-base-flickr , which only contains this attribute but no "visual_encoder"
Please update related code or give us detailed instruction on using it.
Thank you very much for your time and assistance.