Serbian language

When things don't work as they should.
Serbian language

Post by kolenmi

I can not submit OCR for serbian language.
I get the following message
"Exception calling Tesseract with language option: hbs; RAN: /usr/bin/tesseract - - -l hbs STDOUT: STDERR: Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/hbs.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'hbs' Tesseract couldn't load any languages! Could not initialize tesseract. The requested OCR language "hbs" is not available and needs to be installed. "
I installed tesseract-ocr-srp and tesseract-ocr-srp-latn packages.
There is no tesseract-ocr-hbs package.
How to solve this problem?
I need both, Latin and Cyrillic, for OCR production.
Can you include, also, sebian language in mayanedms?
Thank you
Re: Serbian language

Post by rosarior


As you noted, Tesseract doesn't support the HBS locale but it does support SRP and SRP-LATN. Install the supported ones with

Code: Select all

-e MAYAN_APT_INSTALLS="tesseract-ocr-deu tesseract-ocr-spa"
and when uploading your documents use that locale instead of "HBS".

Translating Mayan into other languages is simple but we rely on native speakers volunteers:

1- Open an account on Transifex ( it is free.
2- Request the creation of the Serbian language.
3- Translate the strings using the web interface, no programing knowledge needed. We include a Google translate API key and have enable automatic translations to help you out.

As soon as the language translation percentage is above 15 or 20%, we will enable and include the language into Mayan.

