Releases: webrecorder/browsertrix-crawler
Releases · webrecorder/browsertrix-crawler
Browsertrix Crawler v1.12.4
What's Changed
- track crawlIds included in each --collection directory by @ikreymer in #1005
- qa: don't add pages from WACZ files that have fromDependency set by @ikreymer in #1010
- fix hashtag in seed URLs for single page / non-spa scopes: by @ikreymer in #1013
- add reference to external WACZ per revisit record by @ikreymer in #1009
Full Changelog: v1.12.3...v1.12.4
Browsertrix Crawler v1.12.3
What's Changed
- make sure direct-fetched pages are not double-counted: by @ikreymer in #996
- fix --overwrite option: by @ikreymer in #999
Full Changelog: v1.12.2...v1.12.3
Browsertrix Crawler v1.12.2
What's Changed
Full Changelog: v1.12.1...v1.12.2
Browsertrix Crawler v1.12.1
Browsertrix Crawler v1.12.0
Major features
- Deduplication with revisit records + separate index: User Docs and Developer Docs/Architecture
What's Changed
- Dedup Initial Implementation by @ikreymer in #889
- Fix browser network loading by @ikreymer in #963
- frame behaviors: use frame.evaluate() instead of custom evaluteWithCLI() by @ikreymer in #964
- rollover: check for rollover before writing new records, not after. by @ikreymer in #974
- Support QA with deduped crawls by @ikreymer in #977
- Add Indexer options to commit/cancel single crawl by @ikreymer in #978
- fix regression where behaviors are run prematurely in new pages due to remaining 'framenavigated' listener by @ikreymer in #979
- Fix up comment related to cleaning up event listeners from behaviors by @tw4l in #980
- Add OpenContainers labels to Dockerfile to support Dependabot by @Mr0grog in #972
- Interrupt instead of fail crawl when not fatal by @ikreymer in #973
- Fix streaming response / retry mechanism when loading from browser by @ikreymer in #975
- Upgrade normalize-url to v9.0.0, fix seed decoding issue by @Mr0grog in #984
- redis config tweaks: by @ikreymer in #987
- Request original url, store normalized url for comparison check/dedupe by @ikreymer in #986
- deps update for 1.12: by @ikreymer in #988
- Dedupe docs by @ikreymer in #989
New Contributors
Full Changelog: v1.11.4...v1.12.0
Browsertrix Crawler v1.12.0-beta.2
Browsertrix Crawler v1.12.0-beta.1
What's Changed
- Fix default user-agent to not include minor version + set sec-ua-ch-* headers by @ikreymer in #962
- fix issues related to profile directory placed in /profile: by @ikreymer in #969
- add decompress() interceptor, support undici.request() without decompression + keep content-encoding if no decompression by @ikreymer in #970
- Dedup Initial Implementation by @ikreymer in #889
- Fix browser network loading by @ikreymer in #963
- frame behaviors: use frame.evaluate() instead of custom evaluteWithCLI() by @ikreymer in #964
- rollover: check for rollover before writing new records, not after. by @ikreymer in #974
- Support QA with deduped crawls by @ikreymer in #977
- Add Indexer options to commit/cancel single crawl by @ikreymer in #978
- fix regression where behaviors are run prematurely in new pages due to remaining 'framenavigated' listener by @ikreymer in #979
- Fix up comment related to cleaning up event listeners from behaviors by @tw4l in #980
- Add OpenContainers labels to Dockerfile to support Dependabot by @Mr0grog in #972
- Interrupt instead of fail crawl when not fatal by @ikreymer in #973
- Fix streaming response / retry mechanism when loading from browser by @ikreymer in #975
New Contributors
Full Changelog: v1.12.0-beta.0...v1.12.0-beta.1
Browsertrix Crawler v1.11.4
What's Changed
- add decompress() interceptor, support undici.request() without decompression + keep content-encoding if no decompression by @ikreymer in #970
Full Changelog: v1.11.3...v1.11.4
Browsertrix Crawler v1.11.3
What's Changed
- Fix default user-agent to not include minor version + set sec-ua-ch-* headers by @ikreymer in #962
- fix issues related to profile directory placed in /profile: by @ikreymer in #969
Full Changelog: v1.11.2...v1.11.3
Browsertrix Crawler v1.12.0-beta.0
Initial Dedupe Beta Release (for internal testing)
Full Changelog: v1.11.2...v1.12.0-beta.0