padding embedding in my mind is usually 1 shared token, but GLIDE use nn.Parameter. My question is what about 128K context-window?Why not share padding token?
Thanks for your attention
code is at https://github.com/openai/glide-text2im/blob/main/glide_text2im/text2im_model.py#L10