ElasticSearch bulk reindexing errors

When things don't work as they should.
Post Reply
joekhoobyar
Posts: 17
Joined: Mon Feb 24, 2020 3:49 am

ElasticSearch bulk reindexing errors

Post by joekhoobyar »

I get a lot of failures like this in the logs, for reindexing things like cabinets (see below) and tags.

Code: Select all

mayan.apps.dynamic_search.tasks <1758> [ERROR] "task_index_instance() line 108 Unexpected error calling `task_index_instance` with keyword arguments {'app_label': 'cabinets', 'model_name': 'cabinet', 'object_id': 2, 'exclude_app_label': None, 'exclude_model_name': None, 'exclude_kwargs': None}."
[2022-07-04 17:17:03,331: ERROR/ForkPoolWorker-280] [31;1mUnexpected error calling `task_index_instance` with keyword arguments {'app_label': 'cabinets', 'model_name': 'cabinet', 'object_id': 2, 'exclude_app_label': None, 'exclude_model_name': None, 'exclude_kwargs': None}.[0m
[2022-07-04 17:17:03,334: ERROR/ForkPoolWorker-280] Task mayan.apps.dynamic_search.tasks.task_index_instance[19c88bb9-d9b0-454e-af12-5ceb9481ac19] raised unexpected: DynamicSearchException("Unexpected error calling `task_index_instance` with keyword arguments {'app_label': 'cabinets', 'model_name': 'cabinet', 'object_id': 2, 'exclude_app_label': None, 'exclude_model_name': None, 'exclude_kwargs': None}.")
Traceback (most recent call last):
  File "/opt/mayan-edms/lib/python3.9/site-packages/mayan/apps/dynamic_search/tasks.py", line 88, in task_index_instance
    SearchBackend.get_instance().index_instance(
  File "/opt/mayan-edms/lib/python3.9/site-packages/mayan/apps/dynamic_search/backends/elasticsearch.py", line 195, in index_instance
    self.get_client().index(
  File "/opt/mayan-edms/lib/python3.9/site-packages/elasticsearch/client/utils.py", line 347, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/opt/mayan-edms/lib/python3.9/site-packages/elasticsearch/client/__init__.py", line 413, in index
    return self.transport.perform_request(
  File "/opt/mayan-edms/lib/python3.9/site-packages/elasticsearch/transport.py", line 466, in perform_request
    raise e
  File "/opt/mayan-edms/lib/python3.9/site-packages/elasticsearch/transport.py", line 427, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/opt/mayan-edms/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py", line 291, in perform_request
    self._raise_error(response.status, raw_data)
  File "/opt/mayan-edms/lib/python3.9/site-packages/elasticsearch/connection/base.py", line 328, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', "failed to parse field [documents__uuid] of type [keyword] in document with id '2'. Preview of field's value: 'c01d64c7-bde0-4bd2-8caa-0e9927da93f2 658e5d7f-4004-4af5-ae28-844bf557eb37 .... SNIP (sooooo many values) ...")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/mayan-edms/lib/python3.9/site-packages/celery/app/trace.py", line 450, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/opt/mayan-edms/lib/python3.9/site-packages/celery/app/trace.py", line 731, in __protected_call__
    return self.run(*args, **kwargs)
  File "/opt/mayan-edms/lib/python3.9/site-packages/mayan/apps/dynamic_search/tasks.py", line 109, in task_index_instance
    raise DynamicSearchException(error_message) from exception
mayan.apps.dynamic_search.exceptions.DynamicSearchException: Unexpected error calling `task_index_instance` with keyword arguments {'app_label': 'cabinets', 'model_name': 'cabinet', 'object_id': 2, 'exclude_app_label': None, 'exclude_model_name': None, 'exclude_kwargs': None}.
I can tell that I am at least "partially" reindexed at this point - because I can successfully search for a lot of documents.
User avatar
michael
Developer
Developer
Posts: 297
Joined: Sun Apr 19, 2020 6:21 am

Re: ElasticSearch bulk reindexing errors

Post by michael »

This error message took a long time to figure out. Here is what we've discovered.

Document UUID fields are typecasted as ElasticSearch `Keyword` fields. A Mayan document UUID field is 36 characters long. It turned out that ElasticSearch Keyword fields cannot hold unlimited data and are restricted to 32766 bytes.

If a document container object like an index or a cabinet has more that 910 documents attached, the resulting search engine index refresh query will contain a Keyword fields longer than 32766 bytes, which will triggering the error message.

We are investigating the possibility and outcomes of using a different ElasticSearch field type for Mayan's UUID fields.
User avatar
germain
Posts: 4
Joined: Tue Jul 26, 2022 12:32 am
Location: Canada
Contact:

Re: ElasticSearch bulk reindexing errors

Post by germain »

Hi, I found the thread searching for a problem I had with elasticsearch and mayan.

When I search for any term, I only get irrelevant results. Anyway,I need further debugging before submitting bugs.

But I have question.

Why are you using a keyword es field for indexing many uuid?

You can't search into these fields because you can't tokenize. You can only use it to retrieve a row.

Maybe you could hash. Maybe you can explain more so I can help

Regards
Germain, security specialist
Post Reply