During the training process of VQVAE, when training with VQ-4096cofig, there will be a state_dic mismatch problem. The pre-trained model is vit_base_patch14_dinov2 (using 14x14 patches), but our model definition might be based on 16x16 patches. Could you tell me how to handle it
During the training process of VQVAE, when training with VQ-4096cofig, there will be a state_dic mismatch problem. The pre-trained model is vit_base_patch14_dinov2 (using 14x14 patches), but our model definition might be based on 16x16 patches. Could you tell me how to handle it