(2006, 'MySQL server has gone away')

When things doesn't work as they should.
Post Reply
Crayiii
Posts: 9
Joined: Fri Aug 24, 2018 12:25 am

(2006, 'MySQL server has gone away')

Post by Crayiii » Wed Nov 07, 2018 6:41 am

I get this error fairly often on the OCR page. If I resubmit the documents the OCR works and the error goes away. Is there a way to submit all the documents that don't OCR completed on them at once?

User avatar
rosarior
Posts: 159
Joined: Tue Aug 21, 2018 3:28 am

Re: (2006, 'MySQL server has gone away')

Post by rosarior » Thu Nov 15, 2018 6:35 am

I found this (https://matomo.org/faq/troubleshooting/faq_183/). Since the OCR update query sends a lot of text my guess is that the "max_allowed_packet" MySQL setting is the one that could improve this.

We are working on a method to detect which documents don't have OCR, this is the first step to later add the ability to re-submit failed OCR documents. Since OCR happens as a background task this is not a straight forward problem to solve. The best solution at the moment is to make a workflow or index that groups the documents whose pages don't all have OCR content. This logic is not perfect since a blank page won't return any OCR text.

One solution we've have discussed is adding a flag to each page to record that the page has gone thought the OCR process. This would require breaking up the current implementation of the OCR system since it is done on the entire document per task to allow sending an event and a signal when the OCR of a document has finished. These events and signals are used by other parts of Mayan like the Index and the Workflows. The OCR task would need to be split into smaller tasks that operate on a document page only. This creates then a new problem of tasks synchronization. The library we use for background tasks supports this, but this feature comes with its own set of potential issues (http://docs.celeryproject.org/en/v2.3.3 ... tant-notes). These design decisions are the reason this feature has not been implemented yet.

Sometimes we share extensive technical explanations (like now) for the reasons we don't implement a feature. We don't do this to shame the user asking the question but in the hope someone might have experience or an implementation idea to help with the solution.

Post Reply