Skip to content

Latest commit

 

History

History
14 lines (9 loc) · 307 Bytes

File metadata and controls

14 lines (9 loc) · 307 Bytes

DocExtractor

Tiny Python3 script to extract Wikipedia documents from multiple files.

Requirements

  • Python3
  • pathvalidate (pip3 install pathvalidate)

Usage

python3 doc_extractor.py --source={PATH_TO_SOURCE_DIR} --target={PATH_TO_TARGET_DIR} --extension={FILE_EXTENSION=article}