I use nn.Softmax(dim=-1) to softmax. I find different outputs.
a = [-3.6180e-01, 6.6926e-01, 1.2248e+01, -9.5795e-01]
b = [-3.6180e-01, 6.6926e-01, 1.2248e+01, -9.5795e-01, -9.5795e-01]
softmax(a) = [3.3403e-06, 9.3662e-06, 9.9999e-01, 1.8402e-06]
softmax(b) =[3.3403e-06, 9.3661e-06, 9.9998e-01, 1.8402e-06, 1.8402e-06]
The different softmax results result in different sentence embedding, sometimes the embedding differ a lot.I test transeformers the question cant repoduce. This bug appears in transformers modified by our company. Any help is appreciate!