Releases: Unstructured-IO/unstructured
Releases · Unstructured-IO/unstructured
0.21.5
What's Changed
- feat: custom fallback for language detection by @claytonlin1110 in #4238
- Add Github action for time regressions by @aadland6 in #4261
- fix: relax lower bound for pdfminer.six by @badGarnet in #4262
New Contributors
- @claytonlin1110 made their first contribution in #4238
- @aadland6 made their first contribution in #4261
Full Changelog: 0.21.2...0.21.5
0.21.2
0.21.1
0.21.0
0.21.0
Fixes
- Replace NLTK with spaCy to remediate CVE-2025-14009: NLTK's downloader uses
zipfile.extractall()without path validation, enabling RCE via malicious packages (CVSS 10.0, no patch available). spaCy models install as pip packages, eliminating the vulnerable downloader entirely.
0.20.8
What's Changed
- fix: set max decompressed size for elements JSON by @qued in #4244
- fix: update depdencies by @badGarnet in #4247
Full Changelog: 0.20.6...0.20.8
0.20.6
What's Changed
- Automate pypi publishing by @PastelStorm in #4239
- fix: remove duplicate characters caused by fake bold rendering in PDFs by @bittoby in #4215
- Improve fast partition cold start by @CyMule in #4242
- fix: gracefully handle invalide html string during chunking by @badGarnet in #4243
- fix: remap parent id after hashing by @badGarnet in #4245
New Contributors
Full Changelog: 0.20.1...0.20.6
0.20.2
Release 0.20.2
0.20.1
What's Changed
- Add Python 3.11 and 3.13 support by @PastelStorm in #4236
- Use bigger runner to publish images by @PastelStorm in #4237
Full Changelog: 0.19.3...0.20.1
0.19.3
What's Changed
- feat: add group_elements_by_parent_id utility function by @MkDev11 in #4207
- Preserve newlines in Table and TableChunk elements during PDF partitioning by @eureka928 in #4214
- Fix: make pdf image dpi consistent by @badGarnet in #4217
- chore: bump dependencies for 0.18.34 by @luke-kucing in #4221
- feat: increase PIL's max image pixel value for pdf partition by @badGarnet in #4220
- Migrate to uv by @PastelStorm in #4226
- Fix ARM64 paddlepaddle image builder bug by @PastelStorm in #4228
- Fix Docker ARM64 image failure, use 8-core github runners by @PastelStorm in #4232
- Fix ARM64 image issues by @PastelStorm in #4233
Full Changelog: 0.18.32...0.19.3
0.18.32
What's Changed
- feat: put pdfium call behind a threadlock by @badGarnet in #4211
Full Changelog: 0.18.31...0.18.32