Skip to content

Use wikidata to complete seeds #50

@sebastian-nagel

Description

@sebastian-nagel

Initially, the news crawler was seeded with URLs from news sites from DMOZ, see #8 for the procedure. DMOZ isn't updated anymore, but Wikidata could be a replacement to complete the seed list:

  • select all instances of newspaper (news media, or similar) having an official website:
    SELECT DISTINCT ?item ?itemLabel ?lang ?url
    WHERE
    { 
      ?item wdt:P31/wdt:P279* wd:Q11032.
      ?item wdt:P856 ?url.  # with official website
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de,ru,fr,es,it,ja,zh,*" }
      OPTIONAL {
         ?item wdt:P407 ?language.
         ?language wdt:P220 ?lang.
       }
    }
    LIMIT 50
    (execute query on Wikidata query service)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions