GitHub - HairyFotr/OCRTrain: (Over)fit OCR to your dataset with genetic algorithms. (outdated, ping for update)

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
img		img
OCRTrain.scala		OCRTrain.scala
README		README
params		params
recaptchaExample		recaptchaExample
runTrain		runTrain

Repository files navigation

-------------
 OCR Trainer
-------------

1. Put your images into img/
2. Put your transcriptions into a text file... "filename transcription"
3. Run runTrain <yourTextFile>
4. Wait and wait and wait :)

Things you need:
  sudo apt-get install imagemagick tesseract-ocr gocr cuneiform ocrad

Things you should probably set (a.k.a. things I should abstract away into files):
  allowedCharacters / stringFilter
  the appropriate string scoring algorithm
  different sequence of param-changing algorithms