Add languages to OCR

When things doesn't work as they should.
Post Reply
Posts: 2
Joined: Sun Jan 13, 2019 12:34 am

Add languages to OCR

Post by Gasur » Sun Jan 13, 2019 12:45 am


I've been trying to add Danish to my list of languages for OCR support. But unsuccessful so far. By default it has support for odd languages like ancient greek and similar, but not Danish.
I am using the docker version, manual with MySQL instead of PostreSQL, however I did spin up a complete fresh one using the one line installer on your website, no luck.
Danish is supported byTesseract as shown here: ... Data-Files.

Code: Select all

root@db34fa5eaea5:/opt/mayan-edms# apt-cache search tesseract-ocr
tesseract-ocr-dan-frak - tesseract-ocr language files for Danish (Fraktur)
tesseract-ocr-dan - tesseract-ocr language files for Danish
tesseract-ocr-osd - tesseract-ocr language files for script and orientation
tesseract-ocr-eng - tesseract-ocr language files for English
tesseract-ocr - Tesseract command line OCR tool
tesseract-ocr-equ - tesseract-ocr language files for equations
What I've tried so far:
  • I've tried reinstalling the dockers with different databases (albeit should be the difference).
  • Installed OCR packages using the -e MAYA_APT_INSTALL parameter
  • Installed it manually inside the container, using apt install tesseract-ocr-dan tesseract-ocr-dan-frak
  • Tried changing the OCR tool from the default one to ocr.backends.tesseract.Tesseract, albeit the docker crashed stating that no such module exist.

Any ideas?

Post Reply