Let's go scraping!

Setup

This example requires a few additional libraries. You can install them using pip:

pip install -r requirements.txt

Running the example

redun run workflow.py main

By default this will scrape web pages from https://www.python.org/ with a depth of 2 link traversals. All of the HTML files encountered will be stored in crawl/. Word frequency across all pages will be calculated and a CSV of the word counts will be stored in computed/word_counts.txt.

Lastly, an HTML report is generated in reports/report.html that summarizes the scraping and analysis. The report is generated using a jinja2 template stored in templates/report.html.

Exercises for the reader

Feel free to try other urls and depth of scraping using the task arguments:

redun run workflow.py main --url URL --depth DEPTH

Also feel free to alter the report template templates/report.html. It is passed to the task make_report() as a File argument, so you should have automatic reactivity to changes in the template when rerunning the workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let's go scraping!

Setup

Running the example

Exercises for the reader

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Let's go scraping!

Setup

Running the example

Exercises for the reader