Skip to content

wmde/WDumps-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraping WDumps data

Open In Colab

In order to better understand what kinds of entity data dump subsets our users are interested in, this repository scrapes all dump subsets listed under "recent dumps". The scrape includes a JSON representation of the filters that were used to generate the dump.

The notebook generates a csv file that includes filter data in a human-readable form. Each row of the csv includes the following columns:

  • dump name
  • URL
  • filter (in human-readable form including labels for any items and properties used)
  • statements included in the dump (in human-readable form)
  • labels (yes/no)
  • descriptions (yes/no)
  • aliases (yes/no)
  • sitelinks (yes/no)
  • languages

Development

Install Dependencies for Package

pip install -e ".[dev]"

About

A notebook to scrape data about dump subsets from WDumps.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors