Page 1 of 1

Upgrade from 4.0.22 to 4.2.5 - OCR, Parsing etc. not working anymore

Posted: Thu Jun 16, 2022 8:17 am
by bernroth
Dear Community!

We recently upgraded our Mayan EDMS installation (based on docker-compose) from 4.0.22 to 4.2.5.
This will be the last "issue" to report for the meantime.

After fixing the sources, documents are getting imported and shown in the "Recently created" section.
Unfortunately there is no "Content" (OCR data) available.
Options like "delete parsed content", "submit for parsing" are not working.

While watching the logs I found suspicious errors like

Code: Select all

app_1                                     | [2022-06-15 15:18:18,957: ERROR/ForkPoolWorker-177] Task mayan.apps.dynamic_search.tasks.task_deindex_instance[8c2f0382-3e39-42b7-ae13-9b09265ae51c] raised unexpected: DoesNotExist('IndexInstanceNode matching query does not exist.')
app_1                                     | Traceback (most recent call last):
app_1                                     |   File "/opt/mayan-edms/lib/python3.9/site-packages/celery/app/trace.py", line 450, in trace_task
app_1                                     |     R = retval = fun(*args, **kwargs)
app_1                                     |   File "/opt/mayan-edms/lib/python3.9/site-packages/celery/app/trace.py", line 731, in __protected_call__
app_1                                     |     return self.run(*args, **kwargs)
app_1                                     |   File "/opt/mayan-edms/lib/python3.9/site-packages/mayan/apps/dynamic_search/tasks.py", line 29, in task_deindex_instance
app_1                                     |     instance = Model._meta.default_manager.get(pk=object_id)
app_1                                     |   File "/opt/mayan-edms/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
app_1                                     |     return getattr(self.get_queryset(), name)(*args, **kwargs)
app_1                                     |   File "/opt/mayan-edms/lib/python3.9/site-packages/django/db/models/query.py", line 435, in get
app_1                                     |     raise self.model.DoesNotExist(
app_1                                     | mayan.apps.document_indexing.models.IndexInstanceNode.DoesNotExist: IndexInstanceNode matching query does not exist.
I think maybe the best approach to fix those issues is to delete all OCR data and caches and recreate all together.

I only like to keep:

- documents
- tags
- cabinets

... and recreate all other data from the scratch.
Given the amount of data, this will probably take weeks to complete but then - hopefully - no more weird error messages in the log anymore.

Is there maybe any other possibility to address those issues?

Best regards,
Bernhard