Due to engineering constraints, I had to manually export the encoder and decoder of the MarianMT model, as well as the decoder with caching.
There is no problem exporting the encoder and decoder models.
But when I was exporting the decoder model with cache, I found that using different simulated inputs would result in different results when the onnx model was reasoning.
The output of this model is significantly different from the output of the model exported using optimum.
Can someone discuss where the problem may have occurred? Or has anyone encountered this problem before? Can you give me some guidance?