* [ ] Use HOCR file format https://en.wikipedia.org/wiki/HOCR * [ ] Test pipeline * [ ] Design repository structure