Expected behaviour
As we make the change mentioned in #404, we would like to add new supported content_types for deep analysis. Right now we handle:
- image (vips/minimagick)
- video (ffmpeg)
- audio (ffmpeg)
- pdf (poppler)
The gem code is clearly designed to handle new content_types, we could in theory add without too much work the following content_types:
- csv (ruby csv with binary detection, encoding check, parsing 1 or 2 lines) => text/csv + application/csv + application/vnd.ms-excel + text/tab-separated-values
- json (ruby json)
- ndjson (line by line ruby json, with a sanity cap)
- xml (nokogiri XML)
- yaml (ruby yaml with safe_load)
- txt (internal check with binary detection)
- xlsx (ruby zip (>= 3.1) should reject zip bombs + nokogiri XML)
- docx (ruby zip (>= 3.1) should reject zip bombs + nokogiri XML)
- zip (ruby zip (>= 3.1) should reject zip bombs)
- gzip (ruby zlib should reject zip bombs)
- epub (ruby zip (>= 3.1) should reject zip bombs)
- html, xhtml (nokogiri HTML)
- svg (nokogiri XML with extra checks)
- markdown (internal check with binary detection)
To not forget
- Update readme accordingly
Expected behaviour
As we make the change mentioned in #404, we would like to add new supported content_types for deep analysis. Right now we handle:
The gem code is clearly designed to handle new content_types, we could in theory add without too much work the following content_types:
To not forget