Mayan 3.03 search not as expected

When things doesn't work as they should.
Post Reply
Stefan
Posts: 5
Joined: Sun Sep 30, 2018 7:28 pm

Mayan 3.03 search not as expected

Post by Stefan » Sun Sep 30, 2018 7:46 pm

Hello,
I am new to Mayan. I have installed the docker container with Mayan 3.03. I have about 2000 pages pdf in the db yet.
My problem is that the search mostly does an "OR". I cannot get "AND" working unless one of the two terms does not exist at all. To me this is very strange.

Example document search terms and results:
Wohngebäude ==> returns 2 documents (expected)
Wohngebäude 2018 ==> returns 100 documents (wrong)
Wohngebäude AND 2018 ==> returns 100 documents as well (wrong)
Wohngebäude AND 2018bla ==> returns 0 documents (expected)

I can repeat this with any other combinations and get the same effect.
I would appreciate any help because without a working search Mayan is of very limited use for me.

Thanks!
Stefan

Stefan
Posts: 5
Joined: Sun Sep 30, 2018 7:28 pm

Re: Mayan 3.03 search not as expected

Post by Stefan » Mon Oct 01, 2018 6:19 pm

Another example of the weird search behaviour:

Stefan AND Zeugnis ==> returns 12 pages (ok)
Zeugnis AND Stefan ==> returns 100 pages (not plausible)
Wohngebäude AND 2018 ==> returns 100 pages, but
2018 AND Wohngebäude ==> returns 42 pages (ok)

I would really appreciate if somebody could explain it to me how I should combine search terms.

Thanks,
Stefan

User avatar
rosarior
Posts: 159
Joined: Tue Aug 21, 2018 3:28 am

Re: Mayan 3.03 search not as expected

Post by rosarior » Mon Oct 01, 2018 6:21 pm

The search syntax you are using is correct. We'll add more tests cases to the test suit to see what is the cause of the unexpected results you are encountering.

Stefan
Posts: 5
Joined: Sun Sep 30, 2018 7:28 pm

Re: Mayan 3.03 search not as expected

Post by Stefan » Tue Oct 02, 2018 5:07 am

Thank you so much!

best regards,
Stefan

User avatar
rosarior
Posts: 159
Joined: Tue Aug 21, 2018 3:28 am

Re: Mayan 3.03 search not as expected

Post by rosarior » Fri Oct 05, 2018 7:02 am

Version 3.1.4 is out and includes improvements in the search module. Give it a try.

https://docs.mayan-edms.com/releases/3.1.4.html

Stefan
Posts: 5
Joined: Sun Sep 30, 2018 7:28 pm

Re: Mayan 3.03 search not as expected

Post by Stefan » Sun Oct 07, 2018 6:42 pm

Hi,

today I updated to 3.1.4 and the search has improved a lot. Now that search reply is much more correct I noticed that Mayan OCR seems not to OCR any umlauts which is one reason for not finding documents.

I can see in the OCR text that umlauts are not detected.

Is is possible to configure Mayan to consider umlauts / use German language? Is the de language of tesseract installed in the docker image?

Thanks,
Stefan

Stefan
Posts: 5
Joined: Sun Sep 30, 2018 7:28 pm

Re: Mayan 3.03 search not as expected

Post by Stefan » Sun Oct 07, 2018 6:48 pm

I've just seen that tesseract-ocr-deu was not installed in the container.

Would it work if I install it manually in the container and would it be possible to re-ocr all contained pdfs?

Thanks,
Stefan

User avatar
rosarior
Posts: 159
Joined: Tue Aug 21, 2018 3:28 am

Re: Mayan 3.03 search not as expected

Post by rosarior » Sun Oct 07, 2018 7:52 pm

Because adding all tesseract dictionaries would make the image too big we added the ability to install them when launching the image.

Pass the environment variable MAYAN_APT_INSTALLS with the Debian package list you wish to install. Here is an example from the documentation: https://docs.mayan-edms.com/topics/dock ... ple-method

After that, change the document languages to German (can also be changed in bulk by selecting all document of each page) and in the Tools menu use "Submit all documents of a type for OCR" to launch OCR again for all the document of each type.

To make German the default language for new documents change the DOCUMENTS_LANGUAGE setting to deu. This can be done from the Setup menu -> Settings -> Documents -> [EDIT] or by passing the environment variable MAYAN_DOCUMENTS_LANGUAGE when launching the image.

The list of available document languages can be changed by modifying the DOCUMENTS_LANGUAGE_CODES setting in the same way.

Post Reply