Based on a Slack discussion between me and @ttattl I now understand better the reasoning behind applying data augmentation to the validation & test dataset. We should definitely consider some sort of augmentation techniques for the test dataset as well in the future.
One way could be using Test-Time Augmentation: Similar to what data augmentation is doing to the training set, the purpose of Test-Time Augmentation is to perform random modifications to the test images. Thus, instead of showing the regular, "clean" images, only once to the trained model, we will show it the augmented images several times. We will then average the predictions of each corresponding image and take that as our final guess. An example PyTorch implementation can be found here: https://github.com/qubvel/ttach
Based on a Slack discussion between me and @ttattl I now understand better the reasoning behind applying data augmentation to the validation & test dataset. We should definitely consider some sort of augmentation techniques for the test dataset as well in the future.
One way could be using Test-Time Augmentation: Similar to what data augmentation is doing to the training set, the purpose of Test-Time Augmentation is to perform random modifications to the test images. Thus, instead of showing the regular, "clean" images, only once to the trained model, we will show it the augmented images several times. We will then average the predictions of each corresponding image and take that as our final guess. An example PyTorch implementation can be found here: https://github.com/qubvel/ttach