Skip to content

Releases: webrecorder/browsertrix-crawler

Browsertrix Crawler v1.12.4

01 Apr 00:44
cee501a

Choose a tag to compare

What's Changed

  • track crawlIds included in each --collection directory by @ikreymer in #1005
  • qa: don't add pages from WACZ files that have fromDependency set by @ikreymer in #1010
  • fix hashtag in seed URLs for single page / non-spa scopes: by @ikreymer in #1013
  • add reference to external WACZ per revisit record by @ikreymer in #1009

Full Changelog: v1.12.3...v1.12.4

Browsertrix Crawler v1.12.3

18 Mar 20:30

Choose a tag to compare

What's Changed

Full Changelog: v1.12.2...v1.12.3

Browsertrix Crawler v1.12.2

11 Mar 21:53
0a5ed51

Choose a tag to compare

What's Changed

Full Changelog: v1.12.1...v1.12.2

Browsertrix Crawler v1.12.1

11 Mar 18:41
989d05b

Choose a tag to compare

What's Changed

Full Changelog: v1.12.0...v1.12.1

Browsertrix Crawler v1.12.0

10 Mar 20:15

Choose a tag to compare

Major features

What's Changed

  • Dedup Initial Implementation by @ikreymer in #889
  • Fix browser network loading by @ikreymer in #963
  • frame behaviors: use frame.evaluate() instead of custom evaluteWithCLI() by @ikreymer in #964
  • rollover: check for rollover before writing new records, not after. by @ikreymer in #974
  • Support QA with deduped crawls by @ikreymer in #977
  • Add Indexer options to commit/cancel single crawl by @ikreymer in #978
  • fix regression where behaviors are run prematurely in new pages due to remaining 'framenavigated' listener by @ikreymer in #979
  • Fix up comment related to cleaning up event listeners from behaviors by @tw4l in #980
  • Add OpenContainers labels to Dockerfile to support Dependabot by @Mr0grog in #972
  • Interrupt instead of fail crawl when not fatal by @ikreymer in #973
  • Fix streaming response / retry mechanism when loading from browser by @ikreymer in #975
  • Upgrade normalize-url to v9.0.0, fix seed decoding issue by @Mr0grog in #984
  • redis config tweaks: by @ikreymer in #987
  • Request original url, store normalized url for comparison check/dedupe by @ikreymer in #986
  • deps update for 1.12: by @ikreymer in #988
  • Dedupe docs by @ikreymer in #989

New Contributors

Full Changelog: v1.11.4...v1.12.0

Browsertrix Crawler v1.12.0-beta.2

03 Mar 01:07
4c60d82

Choose a tag to compare

Pre-release

What's Changed

  • Upgrade normalize-url to v9.0.0, fix seed decoding issue by @Mr0grog in #984
  • Redis config tweaks: by @ikreymer in #987
  • Request original url, store normalized url for comparison check/dedupe by @ikreymer in #986

Full Changelog: v1.12.0-beta.1...v1.12.0-beta.2

Browsertrix Crawler v1.12.0-beta.1

23 Feb 21:06
974b22f

Choose a tag to compare

Pre-release

What's Changed

  • Fix default user-agent to not include minor version + set sec-ua-ch-* headers by @ikreymer in #962
  • fix issues related to profile directory placed in /profile: by @ikreymer in #969
  • add decompress() interceptor, support undici.request() without decompression + keep content-encoding if no decompression by @ikreymer in #970
  • Dedup Initial Implementation by @ikreymer in #889
  • Fix browser network loading by @ikreymer in #963
  • frame behaviors: use frame.evaluate() instead of custom evaluteWithCLI() by @ikreymer in #964
  • rollover: check for rollover before writing new records, not after. by @ikreymer in #974
  • Support QA with deduped crawls by @ikreymer in #977
  • Add Indexer options to commit/cancel single crawl by @ikreymer in #978
  • fix regression where behaviors are run prematurely in new pages due to remaining 'framenavigated' listener by @ikreymer in #979
  • Fix up comment related to cleaning up event listeners from behaviors by @tw4l in #980
  • Add OpenContainers labels to Dockerfile to support Dependabot by @Mr0grog in #972
  • Interrupt instead of fail crawl when not fatal by @ikreymer in #973
  • Fix streaming response / retry mechanism when loading from browser by @ikreymer in #975

New Contributors

Full Changelog: v1.12.0-beta.0...v1.12.0-beta.1

Browsertrix Crawler v1.11.4

12 Feb 18:16
325d7fe

Choose a tag to compare

What's Changed

  • add decompress() interceptor, support undici.request() without decompression + keep content-encoding if no decompression by @ikreymer in #970

Full Changelog: v1.11.3...v1.11.4

Browsertrix Crawler v1.11.3

09 Feb 23:31

Choose a tag to compare

What's Changed

  • Fix default user-agent to not include minor version + set sec-ua-ch-* headers by @ikreymer in #962
  • fix issues related to profile directory placed in /profile: by @ikreymer in #969

Full Changelog: v1.11.2...v1.11.3

Browsertrix Crawler v1.12.0-beta.0

31 Jan 00:40

Choose a tag to compare

Pre-release

Initial Dedupe Beta Release (for internal testing)

Full Changelog: v1.11.2...v1.12.0-beta.0