Issue with OCR on Watched or Staging folders

When things doesn't work as they should.
Post Reply
ln-ap
Posts: 2
Joined: Fri Oct 19, 2018 11:49 am

Issue with OCR on Watched or Staging folders

Post by ln-ap » Fri Oct 19, 2018 12:08 pm

I'm having an issue with content automatically uploaded through watched or staged folders. OCR occurs fine when using manual upload through the webform.

I've converted the database, started with a fresh container and DB and referenced the faq here, https://mayan.readthedocs.io/en/v3.1.2/topics/faq.html

Here is the error in the logs:
mayan-app | ocr.managers <68> [ERROR] "process_document_version() line 66 OCR error for document version: 12; (1366, "Incorrect string value: '\\xEF\\xAC\\x81rei...' for column 'content' at row 1")"

mayan-app | [2018-10-19 11:47:01,753: ERROR/Worker-1] OCR error for document version: 12; (1366, "Incorrect string value: '\\xEF\\xAC\\x81rei...' for column 'content' at row 1")

I'm running a mysql backend and docker-compose for Mayan and supporting containers. Any help is appreciated.

ln-ap
Posts: 2
Joined: Fri Oct 19, 2018 11:49 am

Re: Issue with OCR on Watched or Staging folders

Post by ln-ap » Sat Oct 20, 2018 6:18 pm

Hi, just wanted to provide an update. I spun up another DB container with a different version of mysql (mysql/mysql-server:5.6.41) and am not having any issues with OCR or string warnings. The mysql server version used in the production backend is 5.7.23. I also have another sql server at work that does not give any issue either, its version is 5.7.19; collation and charset is the same.

To recap results:
- MySQL 5.7.23 gave me the original errors
- MySQL 5.6.41 no issues
- MySQL 5.7.19 in another environment works fine (container accessing mysql backend server)

Here is copy of my docker-compose file that works:

Code: Select all

version: '2'

services:
  broker:
    container_name: mayan-edms-broker
    image: healthcheck/rabbitmq
    environment:
      RABBITMQ_DEFAULT_USER: mayan
      RABBITMQ_DEFAULT_PASS: mayan
      RABBITMQ_DEFAULT_VHOST: mayan
    volumes:
      - ./broker:/var/lib/rabbitmq
  results:
    container_name: mayan-edms-results
    image: healthcheck/redis
    volumes:
      - ./results:/data
  db:
    container_name: mayan-edms-dbmysql
    image: mysql/mysql-server:5.6.41
    environment:
      MYSQL_DATABASE: mayan
      MYSQL_USER: mayan123
      MYSQL_PASSWORD: mayan456
  mayan-edms:
    container_name: mayan-edms-app
    image: mayanedms/mayanedms:3.1.2
    links:
      - db
    environment:
      MAYAN_BROKER_URL: amqp://mayan:mayan@broker:5672/mayan
      MAYAN_CELERY_RESULT_BACKEND: redis://results:6379/0
      MAYAN_DATABASE_ENGINE: django.db.backends.mysql
      MAYAN_DATABASE_HOST: db
      MAYAN_DATABASE_NAME: mayan
      MAYAN_DATABASE_USER: mayan123
      MAYAN_DATABASE_PASSWORD: mayan456
      MAYAN_DATABASE_CONN_MAX_AGE: 60
    ports:
      - "8086:8000"
    volumes:
      - ./app:/var/lib/mayan
      - ./scans:/opt/scans

Crayiii
Posts: 9
Joined: Fri Aug 24, 2018 12:25 am

Re: Issue with OCR on Watched or Staging folders

Post by Crayiii » Wed Nov 07, 2018 4:03 pm

I had this issue in the past. Here's How I fixed it (copied from the old forum):



Mayan-edms tries to insert utf8mb4 into the content column in the document_ocr (i think that is the name) table of the database which was set for utf8.

I converted the database, table, and that colume to utf8mb4 and now everything is working.

User avatar
rosarior
Posts: 159
Joined: Tue Aug 21, 2018 3:28 am

Re: Issue with OCR on Watched or Staging folders

Post by rosarior » Thu Nov 15, 2018 6:19 am

Thank you both for the update on this. The FAQ has some information on this (https://docs.mayan-edms.com/topics/faq. ... t-at-row-1) although I admit it is buried a bit. I'll see if we can move this to a "Troubleshooting" section under each chapter to make it easier to find.

Post Reply