In the meantime, I have made further progress. First I checked the installation. Everything is set up correctly and I could not find any errors. Mayan also works wonderfully on both installations I maintain, as long as the database is used as search backend.
However, upon checking, I noticed an error in the installation instructions. see: Direct deployment — Mayan EDMS 4.4.5 documentation The Inftructions say that three redis databases are needed. This is wrong in the meantime. Only two redis databases are used according to the current instructions. Databases 1 and 2 are used. Database 0 was used for the queues in older instructions. But the setup has been replaced by RabbitMQ and AMPQ in the meantime.
I also watched the locking in the database and checked if errors happen. Thereby I could not find any obvious errors in the locking mechnism of Mayan.
I found two solutions to the index build up problem and tested them successfully:
-
Keep triggering the reindex process until all documents are indexed. Several triggers may be neccessary. Once all files are indexed, the update of the index in whoosh subsequently works without errors, for example when new documents are added or existing documents are massively changed.
-
Split the Worker B into two Workers B and B2. Worker B2 only processes the queues search and search_slow. Only for Worker B2 --concurrency=1 is set in the supervisor configuration. As soon as the system has completed the initial indexing, --concurrency=1 can be reset. Once all files are indexed, the update of the index in whoosh subsequently works without errors, for example when new documents are added or existing documents are massively changed.
About the cause of the problem:
I have a guess as to what might be causing the problem. The locks in whoosh are created and managed by the whoosh library. According to the whoosh documentation, the locks are created as soon as a writer object is requested by the api. If several processes request a writer object almost at the same time, I could imagine that these objects block each other. There are several file locks, one for each index. If a write object should write several indexes it might get access to one index but not to the others. Instead of writing everything that is currently possible, it waits for the missing write permission . Concretely I could observe that in a minimal installation with only two workers B2 with --concurrency=2 both processes were in the queue and waited 180 seconds for the release of the write permission. Once that time was up, there happened to be several options. If one worker got all the write permissions, the index continued to be created or written. If both workers requested write permissions from the whoosh api at nearly the same time again, it was usually the case that they sent each other back into the 180s queue. Then again nothing happens and the index is not built up. If my assumption is correct, it also explains why both solutions found and described above work.
The simplest solution to this problem would be to make sure that the workers never request a file lock from the whoosh api at the same time. This can be achieved very easily by time skewing using random numbers. Before each whoosh api call for creating a writer object just wait randomly eg. 3s to 30s. This should ensure that only 1 worker (namely the randomly fastest one) gets all the write permissions it needs.
Attached is my config for the supervisor daemon:
[supervisord]
environment=
PYTHONPATH="/home/mayan/media/user_settings",
MAYAN_ALLOWED_HOSTS='["*"]',
MAYAN_MEDIA_ROOT="/home/mayan/media/",
MAYAN_PYTHON_BIN_DIR=/opt/mayan-edms/bin/,
MAYAN_GUNICORN_BIN=/opt/mayan-edms/bin/gunicorn,
MAYAN_GUNICORN_LIMIT_REQUEST_LINE=4094,
MAYAN_GUNICORN_MAX_REQUESTS=500,
MAYAN_GUNICORN_REQUESTS_JITTER=50,
MAYAN_GUNICORN_TEMPORARY_DIRECTORY="",
MAYAN_GUNICORN_TIMEOUT=120,
MAYAN_GUNICORN_WORKER_CLASS=sync,
MAYAN_GUNICORN_WORKERS=3,
MAYAN_SETTINGS_MODULE=mayan.settings.production,
MAYAN_WORKER_A_CONCURRENCY="",
MAYAN_WORKER_A_MAX_MEMORY_PER_CHILD="--max-memory-per-child=300000",
MAYAN_WORKER_A_MAX_TASKS_PER_CHILD="--max-tasks-per-child=100",
MAYAN_WORKER_B_CONCURRENCY="",
MAYAN_WORKER_B_MAX_MEMORY_PER_CHILD="--max-memory-per-child=300000",
MAYAN_WORKER_B_MAX_TASKS_PER_CHILD="--max-tasks-per-child=100",
MAYAN_WORKER_B2_CONCURRENCY="--concurrency=1",
MAYAN_WORKER_B2_MAX_MEMORY_PER_CHILD="--max-memory-per-child=300000",
MAYAN_WORKER_B2_MAX_TASKS_PER_CHILD="--max-tasks-per-child=100",
MAYAN_WORKER_C_CONCURRENCY="",
MAYAN_WORKER_C_MAX_MEMORY_PER_CHILD="--max-memory-per-child=300000",
MAYAN_WORKER_C_MAX_TASKS_PER_CHILD="--max-tasks-per-child=100",
MAYAN_WORKER_D_CONCURRENCY="--concurrency=1",
MAYAN_WORKER_D_MAX_MEMORY_PER_CHILD="--max-memory-per-child=300000",
MAYAN_WORKER_D_MAX_TASKS_PER_CHILD="--max-tasks-per-child=10",
_LAST_LINE=""
[program:mayan-edms-gunicorn]
autorestart = true
autostart = true
command = %(ENV_MAYAN_GUNICORN_BIN)s --bind 127.0.0.1:8000 --env DJANGO_SETTINGS_MODULE=%(ENV_MAYAN_SETTINGS_MODULE)s --limit-request-line %(ENV_MAYAN_GUNICORN_LIMIT_REQUEST_LINE)s --max-requests %(ENV_MAYAN_GUNICORN_MAX_REQUESTS)s --max-requests-jitter %(ENV_MAYAN_GUNICORN_REQUESTS_JITTER)s %(ENV_MAYAN_GUNICORN_TEMPORARY_DIRECTORY)s --worker-class %(ENV_MAYAN_GUNICORN_WORKER_CLASS)s --timeout %(ENV_MAYAN_GUNICORN_TIMEOUT)s --workers %(ENV_MAYAN_GUNICORN_WORKERS)s mayan.wsgi
environment =
DJANGO_SETTINGS_MODULE=%(ENV_MAYAN_SETTINGS_MODULE)s
redirect_stderr = true
user = mayan
[program:mayan-edms-worker_a]
autorestart = true
autostart = true
command = nice -n 0 %(ENV_COMMAND)s
environment =
COMMAND = "%(ENV_MAYAN_PYTHON_BIN_DIR)scelery -A mayan worker %(ENV_MAYAN_WORKER_A_CONCURRENCY)s --hostname=mayan-edms-worker_a.%%h --loglevel=ERROR -Ofair --queues=converter,sources_fast %(ENV_MAYAN_WORKER_A_MAX_MEMORY_PER_CHILD)s %(ENV_MAYAN_WORKER_A_MAX_TASKS_PER_CHILD)s --without-gossip --without-heartbeat",
DJANGO_SETTINGS_MODULE=%(ENV_MAYAN_SETTINGS_MODULE)s
killasgroup = true
numprocs = 1
priority = 998
startsecs = 10
stopwaitsecs = 1
user = mayan
[program:mayan-edms-worker_b]
autorestart = true
autostart = true
command = nice -n 2 %(ENV_COMMAND)s
environment =
COMMAND = "%(ENV_MAYAN_PYTHON_BIN_DIR)scelery -A mayan worker %(ENV_MAYAN_WORKER_B_CONCURRENCY)s --hostname=mayan-edms-worker_b.%%h --loglevel=ERROR -Ofair --queues=document_states_medium,documents,duplicates,file_caching,file_metadata,indexing,metadata,parsing,sources %(ENV_MAYAN_WORKER_B_MAX_MEMORY_PER_CHILD)s %(ENV_MAYAN_WORKER_B_MAX_TASKS_PER_CHILD)s --without-gossip --without-heartbeat",
DJANGO_SETTINGS_MODULE=%(ENV_MAYAN_SETTINGS_MODULE)s
killasgroup = true
numprocs = 1
priority = 998
startsecs = 10
stopwaitsecs = 1
user = mayan
[program:mayan-edms-worker_b2]
autorestart = true
autostart = true
command = nice -n 2 %(ENV_COMMAND)s
environment =
COMMAND = "%(ENV_MAYAN_PYTHON_BIN_DIR)scelery -A mayan worker %(ENV_MAYAN_WORKER_B2_CONCURRENCY)s --hostname=mayan-edms-worker_b2.%%h --loglevel=ERROR -Ofair --queues=search,search_slow %(ENV_MAYAN_WORKER_B2_MAX_MEMORY_PER_CHILD)s %(ENV_MAYAN_WORKER_B2_MAX_TASKS_PER_CHILD)s --without-gossip --without-heartbeat",
DJANGO_SETTINGS_MODULE=%(ENV_MAYAN_SETTINGS_MODULE)s
killasgroup = true
numprocs = 1
priority = 998
startsecs = 10
stopwaitsecs = 1
user = mayan
[program:mayan-edms-worker_c]
autorestart = true
autostart = true
command = nice -n 10 %(ENV_COMMAND)s
environment =
COMMAND = "%(ENV_MAYAN_PYTHON_BIN_DIR)scelery -A mayan worker %(ENV_MAYAN_WORKER_C_CONCURRENCY)s --hostname=mayan-edms-worker_c.%%h --loglevel=ERROR -Ofair --queues=checkouts_periodic,documents_periodic,events,mailing,signatures,sources_periodic,statistics,uploads %(ENV_MAYAN_WORKER_C_MAX_MEMORY_PER_CHILD)s %(ENV_MAYAN_WORKER_C_MAX_TASKS_PER_CHILD)s --without-gossip --without-heartbeat",
DJANGO_SETTINGS_MODULE=%(ENV_MAYAN_SETTINGS_MODULE)s
killasgroup = true
numprocs = 1
priority = 998
startsecs = 10
stopwaitsecs = 1
user = mayan
[program:mayan-edms-worker_d]
autorestart = true
autostart = true
command = nice -n 15 %(ENV_COMMAND)s
environment =
COMMAND = "%(ENV_MAYAN_PYTHON_BIN_DIR)scelery -A mayan worker %(ENV_MAYAN_WORKER_D_CONCURRENCY)s --hostname=mayan-edms-worker_d.%%h --loglevel=ERROR -Ofair --queues=ocr,storage_periodic,tools %(ENV_MAYAN_WORKER_D_MAX_MEMORY_PER_CHILD)s %(ENV_MAYAN_WORKER_D_MAX_TASKS_PER_CHILD)s --without-gossip --without-heartbeat",
DJANGO_SETTINGS_MODULE=%(ENV_MAYAN_SETTINGS_MODULE)s
killasgroup = true
numprocs = 1
priority = 998
startsecs = 10
stopwaitsecs = 1
user = mayan
[program:mayan-edms-celery-beat]
autorestart = true
autostart = true
command = nice -n 1 %(ENV_COMMAND)s
environment =
COMMAND = "%(ENV_MAYAN_PYTHON_BIN_DIR)scelery -A mayan beat --pidfile= -l ERROR",
DJANGO_SETTINGS_MODULE=%(ENV_MAYAN_SETTINGS_MODULE)s
killasgroup = true
numprocs = 1
priority = 998
startsecs = 10
stopwaitsecs = 1
user = mayan
I I hope that all my observations will help to make mayan even better. I wish everyone success in this wonderful task.