Cannot display Chinese character and cannot identify Excel files

When things don't work as they should.
Post Reply
leoliu
Posts: 2
Joined: Tue Sep 10, 2019 3:24 am

Cannot display Chinese character and cannot identify Excel files

Post by leoliu »

Hi There,

I encountered two issues when using Mayan EDMS:
1. Cannot display Chinese character
When uploading files including Chinese characters, these characters cannot be displayed correctly, otherwise, they are displayed as blank boxes. Please check the screenshot below:
The attachment BB.png is no longer available
Here is the error message in OCR errors:
Exception calling Tesseract with language option: cmn; RAN: /usr/bin/tesseract - - -l cmn STDOUT: STDERR: Error opening data file /usr/share/tesseract-ocr/tessdata/cmn.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'cmn' Tesseract couldn't load any languages! Could not initialize tesseract. The requested OCR language "cmn" is not available and needs to be installed.

2. Cannot identify Excel files
Some Excel files cannot be identified but some can. Please check the screenshot below:
BB.png
BB.png (110.1 KiB) Viewed 434 times
Mayan EDMS is a good system and I am fascinated by it. Hope someone could give me a hand. Thanks in advance.

User avatar
rosarior
Developer
Developer
Posts: 490
Joined: Tue Aug 21, 2018 3:28 am
Location: Puerto Rico
Contact:

Re: Cannot display Chinese character and cannot identify Excel files

Post by rosarior »

Hi, thanks for the reports.

It appears that the OCR engine Tesseract does not yet has support for Mandarin. The error message is saying it is not finding the language support for Mandarin. The error is not fatal as only the OCR will fail but all other features will be available.

The character rendering could be as simple as missing fonts. Can you provide some files that show can trigger the issue locally? They can be files with random content you create that behave in the same way as the issue you describe. They should not contain any real information.

Thank you.

leoliu
Posts: 2
Joined: Tue Sep 10, 2019 3:24 am

Re: Cannot display Chinese character and cannot identify Excel files

Post by leoliu »

Hi
Thanks for your quick reply.
Here is a sample file attached.
教师节快乐.zip
(139.01 KiB) Downloaded 73 times

maybe
Posts: 1
Joined: Mon Sep 30, 2019 4:22 am

Re: Cannot display Chinese character and cannot identify Excel files

Post by maybe »

I have same issue,can sombody share how to fix this problem?Thank you!

User avatar
rosarior
Developer
Developer
Posts: 490
Joined: Tue Aug 21, 2018 3:28 am
Location: Puerto Rico
Contact:

Re: Cannot display Chinese character and cannot identify Excel files

Post by rosarior »

The issue was a missing font in the Docker container. Fixed with https://gitlab.com/mayan-edms/mayan-edm ... beb403a670
2019-09-30_04-26_chinese_1.png
2019-09-30_04-26_chinese_1.png (66.42 KiB) Viewed 372 times
2019-09-30_04-26_chinese_2.png
2019-09-30_04-26_chinese_2.png (188.04 KiB) Viewed 372 times
This will be available in the next bug fix version to be released in the next days or maybe hours depending on the progress of another issues being fixed.

User avatar
rosarior
Developer
Developer
Posts: 490
Joined: Tue Aug 21, 2018 3:28 am
Location: Puerto Rico
Contact:

Re: Cannot display Chinese character and cannot identify Excel files

Post by rosarior »


User avatar
rosarior
Developer
Developer
Posts: 490
Joined: Tue Aug 21, 2018 3:28 am
Location: Puerto Rico
Contact:

Re: Cannot display Chinese character and cannot identify Excel files

Post by rosarior »

Version 3.2.8 was just released and includes the fix for this. Please give it a try and let us know if it fixed the issue. Thanks!

Post Reply