Korean character not displaying correctly. [SOLVED]

When things don't work as they should.
Post Reply
Lighthouse
Posts: 3
Joined: Fri Apr 10, 2020 6:56 am

Korean character not displaying correctly. [SOLVED]

Post by Lighthouse »

Hello, I have been testing this document managing system and I found an issue with files with Korean characters. Well... I would not say an 'issue' more like they are simply broken.


Here are some example files. One is encoded with EUC-KR which is the encoding for Korean characters and other is UTF-8. Well they are both broken regardless.

https://www.dropbox.com/s/0e8nxpdqm8v5n ... n.txt?dl=0

https://www.dropbox.com/s/qfobebv8cfrpq ... 8.txt?dl=0

The correct encoding should look like this.
2020-04-10.png
2020-04-10.png (7.1 KiB) Viewed 970 times
I do see an option for "Korean" in Language selection. I am not quite sure whether it is broken or simply not supported yet. Thanks for your time!

Lighthouse
Posts: 3
Joined: Fri Apr 10, 2020 6:56 am

Re: Korean character not displaying correctly.

Post by Lighthouse »

Plus: I looked the OCR error section and I can see two separated errors, each for Kor and UTF


For Test_korean.txt:
Exception calling Tesseract with language option: kor; RAN: /usr/bin/tesseract - - -l kor STDOUT: STDERR: Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/kor.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'kor' Tesseract couldn't load any languages! Could not initialize tesseract. The requested OCR language "kor" is not available and needs to be installed.

For Test_korean_UTF_8.txt:
duplicate key value violates unique constraint "file_caching_cachepartit_partition_id_filename_e553caf0_uniq" DETAIL: Key (partition_id, filename)=(23, base_image) already exists.
It seems kor language for Tesseract is not installed (or simply does not exist yet, duh). But I do not understand the second error.

Lighthouse
Posts: 3
Joined: Fri Apr 10, 2020 6:56 am

Re: Korean character not displaying correctly.

Post by Lighthouse »

Thank you so much. Now it seems the documents are displayed correctly.

User avatar
michael
Developer
Developer
Posts: 26
Joined: Sun Apr 19, 2020 6:21 am

Re: Korean character not displaying correctly. [SOLVED]

Post by michael »

Korean fonts were added to the Docker images in version 3.4.5. Thank you for validating the change!

Post Reply