Current behaviour:
The spoofing_protection for the content_type validator is supposed to perform a deep validation on the uploaded file to ensure its content is the one declared through its metadata. This feature is supposed to counteract a file, which headers can say it's a png, pdf or whatever, but its content is something entirely else.
The current option name and documentation creates an expectation of deep validation that the implementation doesn’t actually deliver. Right now, we open the uploaded file with the UNIX file command, and retrieve the deducted content_type from such command. BUT, the file command is only performing a short validation, kind of close of what Marcel gem is doing inside ActiveStorage (file uses mostly libmagic to determine a content_type).
This check is not enough to declare the file as "spoofed free".
Expected behaviour:
The perfect, but entirely theoretical, approach would be to analyse the whole file content and say : "ok it's a not spoofed x y z", but, to my knowledge, this tool does not exist. A more pragmatical approach would be to open the uploaded file with the right tool (e.g. audio file with ffmpeg) and let the tool say if there is an issue with the file, like the processable_file validator is doing. Therefore we delegate the heavy work, we just gently catch the error.
The only pitfall here would be that the spoofing_protection option would do a bit more than announced: it would also tag malformed, but not spoofed, files as spoofed, which is not the goal. IMO we are sitting between two chairs here, I would prefer to also rename the option to deep_validation.
We could remove the processable_file validator altogether, and rather implement this validation through the deep_validation option for the content_type validator. This has the benefit of:
- Solve the issue with
spoofing_protection option not doing it's job
- Making the gem's API simpler (remove 1 validator)
Proposal Summary
- Deprecate
spoofing_protection option for content_type validator
- Introduce
deep_validation option for content_type validator
deep_validation will:
- Delegate to underlying processors appropriate for type (image/video/audio/pdf) (ie do exactly what
processable_file validator is doing)
- Fail if parse/processing fails for allowed types
- Be opt-in due to expense
- Remove/merge
processable_file into deep_validation (drops the need for a separate validator)
To not forget
- Update the readme extensively to reflect these changes
- Find a way to update Rails custom metadata blobs that the gem sets to reflect these changes
Current behaviour:
The
spoofing_protectionfor thecontent_typevalidator is supposed to perform a deep validation on the uploaded file to ensure its content is the one declared through its metadata. This feature is supposed to counteract a file, which headers can say it's a png, pdf or whatever, but its content is something entirely else.The current option name and documentation creates an expectation of deep validation that the implementation doesn’t actually deliver. Right now, we open the uploaded file with the UNIX
filecommand, and retrieve the deducted content_type from such command. BUT, thefilecommand is only performing a short validation, kind of close of whatMarcelgem is doing inside ActiveStorage (file uses mostly libmagic to determine a content_type).This check is not enough to declare the file as "spoofed free".
Expected behaviour:
The perfect, but entirely theoretical, approach would be to analyse the whole file content and say : "ok it's a not spoofed x y z", but, to my knowledge, this tool does not exist. A more pragmatical approach would be to open the uploaded file with the right tool (e.g. audio file with ffmpeg) and let the tool say if there is an issue with the file, like the
processable_filevalidator is doing. Therefore we delegate the heavy work, we just gently catch the error.The only pitfall here would be that the
spoofing_protectionoption would do a bit more than announced: it would also tag malformed, but not spoofed, files as spoofed, which is not the goal. IMO we are sitting between two chairs here, I would prefer to also rename the option todeep_validation.We could remove the
processable_filevalidator altogether, and rather implement this validation through thedeep_validationoption for thecontent_typevalidator. This has the benefit of:spoofing_protectionoption not doing it's jobProposal Summary
spoofing_protectionoption forcontent_typevalidatordeep_validationoption forcontent_typevalidatordeep_validationwill:processable_filevalidator is doing)processable_fileintodeep_validation(drops the need for a separate validator)To not forget