Hello,
I’m following up a couple of previous threads regarding document processing times and search backends, by @DocCyblade
(Document Processing Times
and
Search Backends with Small Servers - #3 by DocCyblade)
I’ve been doing some more tests, with same results as before.
Except this time I’ve casually noticed that after taking down all Mayan’s docker containers, then taking them back up, all rabbitmq queues (in this case the search one) are wiped out and returns to 0. From being a several thousands or milions.
Here are a few screenshots to try to explain it.
Right after uploading new documents. Be it 2 or 500, single-page PDFs o few KBs XMLs:
After a couple of minutes it gets steady like this, and no more events are generated in the page Tools/Events:
or when I uploaded 500 files at once:
This is the CPU situation:
After taking down all containers, then 1-2 mins later start them again:
And of course CPU resources went down to normal.
Other than CPU, also I/O was affected. I guess it’s obviously normal when processing documents, but it could stay still for hours.
Based on how long it took for a queue of about 1 M to settle down (=about 36 hours), I estimated that a queue of 30 Milions rabbitmq messages would take weeks. For something like 1500 small/single-page PDFs
Is this an expected behaviour? Or am I interrupting something taking down the containers? @roberto.rosario
Please note that to start/stop containers, I’ve used the standard and graceful docker compose command as:
docker compose --file docker-compose.yml --project-name mayan down
and up
You have to set an explicit host name for the rabbitmq container. Otherwise it will get a randomly generated name when you spin it up. The problem is that rabbitmq also uses the hostname as folder name where it saves its data. This way it will never find its stored data when you spin up a new container. I suggested to add a hostname to the default docker-compose.yml on the old forums but up to now it is not there
I see, thanks for the explanation @DrRSatzteil.
At this point, what do I loose when rabbitmq spins up a new container losing its previous data?
As per mayan functionality all seems regular. Documents are all loaded and indexed. There are no new “event” in the Events page.
Also atm I’m keeping off OCR, metadata parsing and document analysis. At first I just need to store documents as they are. If needed I’ll start OCR and analysis.
–
At the moment this is my docker-compose.yml for rabbitmq.
Isn’t it the “RABBITMQ_DEFAULT_VHOST” what you refer to?
rabbitmq:
image: ${MAYAN_DOCKER_RABBITMQ_IMAGE:-rabbitmq}:${MAYAN_DOCKER_RABBITMQ_TAG:-3.11.2-management-alpine}
environment:
RABBITMQ_DEFAULT_USER: ${MAYAN_RABBITMQ_USER:-mayan}
RABBITMQ_DEFAULT_PASS: ${MAYAN_RABBITMQ_PASSWORD:-xxxxxxxxx}
RABBITMQ_DEFAULT_VHOST: ${MAYAN_RABBITMQ_VHOST:-mayan}
Looking for disk space used, indeed I found a huge usage by rabbitmq volume.
These are all duplicated “dead” folder data of the terminated containers? And now I’d need to get rid of them…
du -h -d 1 /var/lib/docker/volumes/
40M /var/lib/docker/volumes/mayan_elasticsearch
90G /var/lib/docker/volumes/mayan_rabbitmq
1.1G /var/lib/docker/volumes/mayan_app
249M /var/lib/docker/volumes/mayan_postgres
8.0K /var/lib/docker/volumes/mayan_redis
91G /var/lib/docker/volumes/
du -h -d 1 /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia
84M /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@00d7cb6bcd6d
11G /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@897e093d8df0
26M /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@e9a285ecdeed
368K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@7afdc413e11f
6.9G /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@abf7a1a3c193
304K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@541c54bfea96
68M /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@8cbfe97ac8a1
39G /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@4db085038e19-plugins-expand
312K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@0c67d4f38237
7.9G /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@f1d7d02b74a3
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@e9a285ecdeed-plugins-expand
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@897e093d8df0-plugins-expand
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@00d7cb6bcd6d-plugins-expand
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@7afdc413e11f-plugins-expand
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@867b9d5af26a-plugins-expand
6.5G /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@867b9d5af26a
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@d536576435ba-plugins-expand
256K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@e336e837d3f2
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@e336e837d3f2-plugins-expand
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213-plugins-expand
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@f1d7d02b74a3-plugins-expand
110M /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@4db085038e19
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@abf7a1a3c193-plugins-expand
5.9G /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@c2b68e422e1e
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@0c67d4f38237-plugins-expand
320K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@d6a0e15cf4ea
14G /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@d536576435ba
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@541c54bfea96-plugins-expand
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@d6a0e15cf4ea-plugins-expand
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@541bda7f8e6e-plugins-expand
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@c2b68e422e1e-plugins-expand
368K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@541bda7f8e6e
4.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@8cbfe97ac8a1-plugins-expand
90G /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia
du -h -d 1 /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/CEB7RRIXP0PD83Y9KT4N4X1NY
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/1JIXAJKACKNYX8ZCESNLC0VRC
39G /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/529H3BLAXFEY42L8SI9M43LKU
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/7C4UTFFSMZSPU15XKMNI86ZC4
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/BWGWV9A1F43533M0AXHC8GB8B
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/ZSLNG19H0JCP70JZ4SJE0W1F
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/D52Q0W80C3O7FF2TCQL0DKAZ7
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/7SE52JVO4UNX09OD4MBCLOGNN
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/6JMGFZ472FQ3A74AQR4831B
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/3IU3LA7LXAZEWBWRXOXCSCOED
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/A0Y2BHY7D84NBORSCY0TL5NOG
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/8UJHWPT3ULY01N49NS2H5J4OT
8.0K /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues/4AP1EO2XHUTCLKDGL6GK2AWT7
39G /var/lib/docker/volumes/mayan_rabbitmq/_data/mnesia/rabbit@57a4dfdd4213/msg_stores/vhosts/5RU2IGKKAZSSHOVUGSVFWGPHU/queues
Yes you will collect lots of data over time if you don’t use an explicit hostname. That’s one good reason to do so.
No it’s not the RABBITMQ_DEFAULT_VHOST setting but you need to add an explicit hostname property to the rabbitmq service. See Compose file version 3 reference
With regards to Mayan functionality: the fact is that some of the queued tasks are not executed because you loose the queues. The implications on Mayan functionalities are however not quite clear. It is most likely that you loose some search indexing tasks as these are usually executed rather late in the process and take a significant amount of time. However you could also loose some index updates or other jobs. Probably most of this will be automatically healed over time because initial changes trigger a lot of following processes. But to get a really precise answer someone with more insights would need to answer.
Thank you for pointing me towards that article. In fact I never found that page in the past
I found the root cause for the lost messages by myself, but I will read the documentation more carefully in the future