Search not working anymore

Hello,
thanks for making this great software.

Recently i noticed that the search function stopped working. Since i don’t use that function very often i don’t know when the problem started, if with a specific version or with the switch to the Docker compose installation.

However every time i try to use the search i get the message:

Search backend error. Verify that the search service is available and that the search syntax is valid for the active search backend; 
[Errno 2] No such file or directory: '/var/lib/mayan/whoosh/_documents.documentsearchresult_0.toc.1682926140.9308484'

Checking inside of the the container i can confirm that not even the /whoosh sub-folder exists.

Re-indexing the search backend did also not help. But after trying that i noticed some errors in the container logs, but i am not sure if its related to the search problem:

  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "

[2023-05-01 18:13:16 +0000] [6970] [INFO] Autorestarting worker after current request.

[2023-05-01 18:13:16 +0000] [6970] [INFO] Worker exiting (pid: 6970)

[2023-05-01 18:13:16 +0000] [6989] [INFO] Booting worker with pid: 6989

/opt/mayan-edms/lib/python3.9/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (5.1.0)/charset_normalizer (2.0.12) doesn't match a supported version!

  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "

I am running version 4.4.6 installed via Docker compose on a normal PC.
Any idea what i might have done wrong?

Thank you!

Hi,

Run the following command to initialize the search system:

docker compose run frontend run_command search_initialize

Afterward the search system can be reindexed from the user interface or with the command:

docker compose run frontend run_command search_reindex

The state of the reindex can be inspected with the command:

docker compose run frontend run_command search_status

This will display something like this:

1 Like

Thank you very much, it seems to work.

But is it normal that it always tries to install the same package when i run those commands?

And should the numbers displayed reflect the actual numbers of documents and pages in my mayan installation? Because it seems not right to have 35 document pages in 875 documents. I let it run over night and its not changing anymore.

ralf@ralf-PC:~/Docker/compose 4.4.6$ docker compose run frontend run_command search_status
mayan: starting entrypoint.sh
Connection attempt #1 to: port postgresql:5432; Connected.
Connection attempt #1 to: port rabbitmq:5672; Connected.
Connection attempt #1 to: port redis:6379; Connected.
mayan: update_uid_gid()
usermod: no changes
mayan: os_package_installs()
Get:1 http ://deb.debian.org/debian bullseye InRelease [116 kB]
Get:2 http ://deb.debian.org/debian-security bullseye-security InRelease [48.4 kB]
Get:3 http ://deb.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:4 http ://deb.debian.org/debian bullseye/main amd64 Packages [8183 kB]
Get:5 http ://deb.debian.org/debian-security bullseye-security/main amd64 Packages [237 kB]
Get:6 http ://deb.debian.org/debian bullseye-updates/main amd64 Packages [14.6 kB]
Fetched 8643 kB in 30s (286 kB/s)
Reading package lists… Done
Reading package lists… Done
Building dependency tree… Done
Reading state information… Done
The following NEW packages will be installed:
tesseract-ocr-deu
0 upgraded, 1 newly installed, 0 to remove and 38 not upgraded.
Need to get 746 kB of archives.
After this operation, 1541 kB of additional disk space will be used.
Get:1 http ://deb.debian.org/debian bullseye/main amd64 tesseract-ocr-deu all 1:4.00~git30-7274cfa-1.1 [746 kB]
Fetched 746 kB in 2s (444 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package tesseract-ocr-deu.
(Reading database … 20450 files and directories currently installed.)
Preparing to unpack …/tesseract-ocr-deu_1%3a4.00~git30-7274cfa-1.1_all.deb …
Unpacking tesseract-ocr-deu (1:4.00~git30-7274cfa-1.1) …
Setting up tesseract-ocr-deu (1:4.00~git30-7274cfa-1.1) …
W: --force-yes is deprecated, use one of the options starting with --allow instead.
mayan: pip_installs()
/opt/mayan-edms/lib/python3.9/site-packages/requests/init.py:102: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (5.1.0)/charset_normalizer (2.0.12) doesn’t match a supported version!
warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn’t match a supported "
Whoosh search model indexing status
===================================
Cabinet: 94
Document: 875
Document file: 50
Document file page: 35
Document type: 0
Document version: 75
Document version page: 74
Group: 0
Index instance node: 252
Message: 0
Metadata type: 0
Role: 0
Signature capture: 0
Tag: 18
User: 0

But is it normal that it always tries to install the same package when i run those commands?

Yes, Docker images are not persistent. If you install an OCR package it will install every time the image is started up from scratch.

And should the numbers displayed reflect the actual numbers of documents and pages in my mayan installation? Because it seems not right to have 35 document pages in 875 documents. I let it run over night and its not changing anymore.

The numbers might not be 100% accurate because this is what the search system reports based on its indexes which don’t always have a 1:1 correlation to database objects.

But for the most part, the page count should be the same or larger than the document count. Otherwise, this could mean several things: the indexing task is being killed by the Linux OOM due to low RAM or CPU cores, some or documents don’t have pages.

In the future please keep forum topics focused on one thing and open a different topic for different discussions.

Thank you for your help, and sorry for the different topics. I am not very proficient with docker, and i didn’t really understand if those things are related. But i will try my best in the future to not mix different topics.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.