Document event log peculiarities

When things doesn't work as they should.
Post Reply
daniel1113
Posts: 14
Joined: Tue Aug 21, 2018 2:32 pm

Document event log peculiarities

Post by daniel1113 » Wed Oct 03, 2018 5:41 pm

I'm running some performance tests on a Mayan install, including OCR performance. Yesterday I uploaded a new document, assigned metadata using the upload wizard, and then manually submitted the document for OCR. Looking at the event log, I see two oddities and was curious if there was an explanation.

1. At upload time, Mayan is setting the metadata twice. Why?

2. I submitted the document for OCR on October 2 at 2:18pm. The log is showing that OCR finished 4 times. Was it OCRd 4 times? If so, why?

These oddities are occurring on many other documents, too.

Screen Shot 2018-10-03 at 11.09.35 AM.jpg
Screen Shot 2018-10-03 at 11.09.35 AM.jpg (100.99 KiB) Viewed 175 times

User avatar
rosarior
Posts: 159
Joined: Tue Aug 21, 2018 3:28 am

Re: Document event log peculiarities

Post by rosarior » Thu Oct 04, 2018 7:08 pm

We have not seen that in other deployments. From the behavior you describe I would suspect that the resource locking is not working and all the worker instances are catching the broker messages and executing the same tasks on the same objects. The problem seems synchronization. The default resource locking backend only works if the system is deployed in one host. In a multiple host deployment switch to the distributed resource locking backend as described here: https://wiki.mayan-edms.com/index.php?title=Scaling_up
Resource locking is a technique to avoid two processes or tasks to modify the same resource at the same time causing a race condition. Mayan uses its own lock manager. By default the lock manager with use a simple file based lock backend ideal for single host installations. For multiple hosts installation the database backend must be used in other to coordinate the resource locks between the different hosts over a share data medium. This is accomplished by modifying the environment variable LOCK_MANAGER_BACKEND in both the direct deployment or the Docker image. Use the value "lock_manager.backends.model_lock.ModelLock" to switch to the database resource lock backend. If you can also write your own lock manager backend for other data sharing mediums with better performance than a relational database like Redis, Memcached, Zoo Keeper.

daniel1113
Posts: 14
Joined: Tue Aug 21, 2018 2:32 pm

Re: Document event log peculiarities

Post by daniel1113 » Thu Oct 04, 2018 7:13 pm

While we will be looking at multi-host distribution in the future, this is all occurring in a single host deployment using the stock master docker image (3.1.3).

EDIT: I tweaked the MAYAN_WORKER_SLOW_CONCURRENCY switch to 0. That may be the culprit. We're testing now. Thanks, Roberto.
Last edited by daniel1113 on Wed Oct 24, 2018 5:48 pm, edited 1 time in total.

User avatar
rosarior
Posts: 159
Joined: Tue Aug 21, 2018 3:28 am

Re: Document event log peculiarities

Post by rosarior » Thu Oct 04, 2018 7:53 pm

Yes, that could be the root of the issue. But still, even when the concurrency is increased the locking system should prevent the duplicated tasks from happening. This one will be hard to diagnose for us because it happens only in production when there are multiple processes running in parallel but I've notified the team to give it a look in case we missed something obvious when adding the concurrency settings. I'm interested in your use case, could have uncovered an edge case we need to address.

Post Reply