Skip to content

Fetcher: optionally slow down fetching from hosts with repeated exceptions #1106

@jnioche

Description

@jnioche

See NUTCH-2946

The fetcher holds for every fetch queue a counter which counts the number of observed "exceptions" seen when fetching from the host (resp. domain or IP) bound to this queue.

As an improvement to increase the politeness of the crawler, the counter value could be used to dynamically increase the fetch delay for hosts where requests fail repeatedly with exceptions or HTTP status codes mapped to ProtocolStatus.EXCEPTION (HTTP 403 Forbidden, 429 Too many requests, 5xx server errors, etc.) Of course, this should be optional. The aim to reduce the load on such hosts already before the configured max. number of exceptions (property fetcher.max.exceptions.per.queue) is hit.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions