Document Content Empty
-
- Posts: 17
- Joined: Wed Oct 03, 2018 3:54 pm
Document Content Empty
Hi all,
I was wondering if Mayan is supposed to show "Content" for non-pdf files? My PDFs all have nice and complete "Content" sections but docx, xls, xlsx, etc don't have any "Content" and they only searchable using the OCR text (which is sketchy on some of the excel files).
Page previews, OCR, etc are all working fine. No parsing errors showing in the web interface either.
If this is supposed to work can anyone give me a starting point to start checking things? I went looking for the error.log file and it doesn't exist where the settings say it should be so either I have no errors or the web interface settings is lying to me about where the error.log file is located.
Any help would be appreciated.
Thanks,
Rob
I was wondering if Mayan is supposed to show "Content" for non-pdf files? My PDFs all have nice and complete "Content" sections but docx, xls, xlsx, etc don't have any "Content" and they only searchable using the OCR text (which is sketchy on some of the excel files).
Page previews, OCR, etc are all working fine. No parsing errors showing in the web interface either.
If this is supposed to work can anyone give me a starting point to start checking things? I went looking for the error.log file and it doesn't exist where the settings say it should be so either I have no errors or the web interface settings is lying to me about where the error.log file is located.
Any help would be appreciated.
Thanks,
Rob
-
- Posts: 17
- Joined: Wed Oct 03, 2018 3:54 pm
Re: Document Content Empty
Nobody else running Mayan in Docker is having this issue? It happens every time I install with Docker.
I just did a clean install on a new VM with Docker and found that the document_cache folder was not automatically created so I wasn't even getting the preview images or the OCR. I was also seeing this on the Docker demo in PWD. Added the folder to the Docker volume and that stuff started working.
Is there another folder that stores the "Content" information that might be missing?
Any help would be really appreciated.
Thanks,
Rob
I just did a clean install on a new VM with Docker and found that the document_cache folder was not automatically created so I wasn't even getting the preview images or the OCR. I was also seeing this on the Docker demo in PWD. Added the folder to the Docker volume and that stuff started working.
Is there another folder that stores the "Content" information that might be missing?
Any help would be really appreciated.
Thanks,
Rob
-
- 50 Posts
- Posts: 89
- Joined: Wed Aug 22, 2018 2:52 pm
Re: Document Content Empty
Hi Rob,
I run Mayan on an x86 docker install... all seems to be working for me.
Could it possibly be a permissions issue with where the folders are being created? Make sure that the user that is running Docker container can write to the disk root of wherever the folders are being created. Also, are there any errors in the Docker logs?
Hope that helps.
Kevin
I run Mayan on an x86 docker install... all seems to be working for me.
Could it possibly be a permissions issue with where the folders are being created? Make sure that the user that is running Docker container can write to the disk root of wherever the folders are being created. Also, are there any errors in the Docker logs?
Code: Select all
docker logs -f [docker_container]
Kevin
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core
-
- Posts: 17
- Joined: Wed Oct 03, 2018 3:54 pm
Re: Document Content Empty
Kevin,
I'm running docker-compose according to the compose yml in the gitlab repo.
I'm continuously getting errors like these in the db container:
and these errors in the app container
I really have no idea how to interpret these things other than that maybe the DB is corrupted and maybe that's causing the app errors. There doesn't seem to be any permission errors since docker is installed as root and the document_cache folder and the document_storage folder are both being written to.
Thanks for your suggestions!
Rob
I'm running docker-compose according to the compose yml in the gitlab repo.
I'm continuously getting errors like these in the db container:
Code: Select all
2018-12-20 11:00:00.089 UTC [86] ERROR: column "document_version__document__date_added" does not exist at character 29
2018-12-20 11:00:00.089 UTC [86] STATEMENT: SELECT (date_trunc('month', document_version__document__date_added)) AS "d", COUNT("documents_documentpage"."id") AS "agg" FROM "documents_documentpage" INNER JOIN "documents_documentversion" ON ("documents_documentpage"."document_version_id" = "documents_documentversion"."id") INNER JOIN "documents_document" ON ("documents_documentversion"."document_id" = "documents_document"."id") WHERE "documents_document"."date_added" BETWEEN '2018-01-01T00:00:00+00:00'::timestamptz AND '2018-12-31T23:59:59.999999+00:00'::timestamptz GROUP BY (date_trunc('month', document_version__document__date_added))
2018-12-20 12:00:00.051 UTC [86] ERROR: column "document__date_added" does not exist at character 29
2018-12-20 12:00:00.051 UTC [86] STATEMENT: SELECT (date_trunc('month', document__date_added)) AS "d", COUNT("documents_documentversion"."id") AS "agg" FROM "documents_documentversion" INNER JOIN "documents_document" ON ("documents_documentversion"."document_id" = "documents_document"."id") WHERE "documents_document"."date_added" BETWEEN '2018-01-01T00:00:00+00:00'::timestamptz AND '2018-12-31T23:59:59.999999+00:00'::timestamptz GROUP BY (date_trunc('month', document__date_added))
2018-12-20 12:00:00.070 UTC [86] ERROR: column "document_version__document__date_added" does not exist at character 29
2018-12-20 12:00:00.070 UTC [86] STATEMENT: SELECT (date_trunc('month', document_version__document__date_added)) AS "d", COUNT("documents_documentpage"."id") AS "agg" FROM "documents_documentpage" INNER JOIN "documents_documentversion" ON ("documents_documentpage"."document_version_id" = "documents_documentversion"."id") INNER JOIN "documents_document" ON ("documents_documentversion"."document_id" = "documents_document"."id") WHERE "documents_document"."date_added" BETWEEN '2018-01-01T00:00:00+00:00'::timestamptz AND '2018-12-31T23:59:59.999999+00:00'::timestamptz GROUP BY (date_trunc('month', document_version__document__date_added))
2018-12-20 13:00:00.057 UTC [86] ERROR: column "document__date_added" does not exist at character 29
2018-12-20 13:00:00.057 UTC [86] STATEMENT: SELECT (date_trunc('month', document__date_added)) AS "d", COUNT("documents_documentversion"."id") AS "agg" FROM "documents_documentversion" INNER JOIN "documents_document" ON ("documents_documentversion"."document_id" = "documents_document"."id") WHERE "documents_document"."date_added" BETWEEN '2018-01-01T00:00:00+00:00'::timestamptz AND '2018-12-31T23:59:59.999999+00:00'::timestamptz GROUP BY (date_trunc('month', document__date_added))
2018-12-20 13:00:00.076 UTC [86] ERROR: column "document_version__document__date_added" does not exist at character 29
2018-12-20 13:00:00.076 UTC [86] STATEMENT: SELECT (date_trunc('month', document_version__document__date_added)) AS "d", COUNT("documents_documentpage"."id") AS "agg" FROM "documents_documentpage" INNER JOIN "documents_documentversion" ON ("documents_documentpage"."document_version_id" = "documents_documentversion"."id") INNER JOIN "documents_document" ON ("documents_documentversion"."document_id" = "documents_document"."id") WHERE "documents_document"."date_added" BETWEEN '2018-01-01T00:00:00+00:00'::timestamptz AND '2018-12-31T23:59:59.999999+00:00'::timestamptz GROUP BY (date_trunc('month', document_version__document__date_added))
2018-12-20 14:00:00.068 UTC [86] ERROR: column "document__date_added" does not exist at character 29
2018-12-20 14:00:00.068 UTC [86] STATEMENT: SELECT (date_trunc('month', document__date_added)) AS "d", COUNT("documents_documentversion"."id") AS "agg" FROM "documents_documentversion" INNER JOIN "documents_document" ON ("documents_documentversion"."document_id" = "documents_document"."id") WHERE "documents_document"."date_added" BETWEEN '2018-01-01T00:00:00+00:00'::timestamptz AND '2018-12-31T23:59:59.999999+00:00'::timestamptz GROUP BY (date_trunc('month', document__date_added))
2018-12-20 14:00:00.090 UTC [86] ERROR: column "document_version__document__date_added" does not exist at character 29
2018-12-20 14:00:00.090 UTC [86] STATEMENT: SELECT (date_trunc('month', document_version__document__date_added)) AS "d", COUNT("documents_documentpage"."id") AS "agg" FROM "documents_documentpage" INNER JOIN "documents_documentversion" ON ("documents_documentpage"."document_version_id" = "documents_documentversion"."id") INNER JOIN "documents_document" ON ("documents_documentversion"."document_id" = "documents_document"."id") WHERE "documents_document"."date_added" BETWEEN '2018-01-01T00:00:00+00:00'::timestamptz AND '2018-12-31T23:59:59.999999+00:00'::timestamptz GROUP BY (date_trunc('month', document_version__document__date_added))
Code: Select all
[2018-12-20 14:00:00,039: ERROR/MainProcess] Task mayan_statistics.tasks.task_execute_statistic[f7a7f683-2eb3-4bf8-ac82-849319c2084e] raised unexpected: IndexError('list index out of range',)
Traceback (most recent call last):
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/tasks.py", line 16, in task_execute_statistic
Statistic.get(slug=slug).execute()
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/classes.py", line 120, in execute
self.store_results(results=self.func())
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/documents/statistics.py", line 146, in total_document_per_month
): qss.until(datetime.date(year, next_month, 1))
IndexError: list index out of range
[2018-12-20 14:00:00,059: ERROR/MainProcess] Task mayan_statistics.tasks.task_execute_statistic[86b76329-5d32-4301-af09-eadd05349e00] raised unexpected: IndexError('list index out of range',)
Traceback (most recent call last):
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/tasks.py", line 16, in task_execute_statistic
Statistic.get(slug=slug).execute()
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/classes.py", line 120, in execute
self.store_results(results=self.func())
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/documents/statistics.py", line 183, in total_document_version_per_month
): qss.until(datetime.date(year, next_month, 1))
IndexError: list index out of range
[2018-12-20 14:00:00,066: ERROR/MainProcess] Task mayan_statistics.tasks.task_execute_statistic[c2703399-8017-467b-b1e1-973abd1aab38] raised unexpected: IndexError('list index out of range',)
Traceback (most recent call last):
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/tasks.py", line 16, in task_execute_statistic
Statistic.get(slug=slug).execute()
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/classes.py", line 120, in execute
self.store_results(results=self.func())
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/documents/statistics.py", line 34, in new_documents_per_month
qss.time_series(start=this_year, end=today, interval='months')
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/documents/statistics.py", line 33, in <lambda>
lambda x: {force_text(MONTH_NAMES[x[0].month]): x[1]},
IndexError: list index out of range
[2018-12-20 14:00:00,087: ERROR/MainProcess] Task mayan_statistics.tasks.task_execute_statistic[236d3ff4-3517-4fa9-8af5-ca0746d3d05f] raised unexpected: IndexError('list index out of range',)
Traceback (most recent call last):
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/tasks.py", line 16, in task_execute_statistic
Statistic.get(slug=slug).execute()
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/classes.py", line 120, in execute
self.store_results(results=self.func())
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/documents/statistics.py", line 96, in new_document_versions_per_month
qss.time_series(start=this_year, end=today, interval='months')
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/documents/statistics.py", line 95, in <lambda>
lambda x: {force_text(MONTH_NAMES[x[0].month]): x[1]},
IndexError: list index out of range
[2018-12-20 14:00:00,114: ERROR/MainProcess] Task mayan_statistics.tasks.task_execute_statistic[c7ebc09c-0dbd-48b6-aa0c-9f654bdb5d93] raised unexpected: IndexError('list index out of range',)
Traceback (most recent call last):
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/tasks.py", line 16, in task_execute_statistic
Statistic.get(slug=slug).execute()
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/classes.py", line 120, in execute
self.store_results(results=self.func())
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/documents/statistics.py", line 56, in new_document_pages_per_month
qss.time_series(start=this_year, end=today, interval='months')
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/documents/statistics.py", line 55, in <lambda>
lambda x: {force_text(MONTH_NAMES[x[0].month]): x[1]},
IndexError: list index out of range
[2018-12-20 14:00:00,139: ERROR/MainProcess] Task mayan_statistics.tasks.task_execute_statistic[6d41810c-def5-4b7f-8135-463906f662c1] raised unexpected: IndexError('list index out of range',)
Traceback (most recent call last):
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/tasks.py", line 16, in task_execute_statistic
Statistic.get(slug=slug).execute()
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/mayan_statistics/classes.py", line 120, in execute
self.store_results(results=self.func())
File "/opt/mayan-edms/local/lib/python2.7/site-packages/mayan/apps/documents/statistics.py", line 220, in total_document_page_per_month
): qss.until(datetime.date(year, next_month, 1))
IndexError: list index out of range
[2018-12-20 14:40:18 +0000] [154] [INFO] Autorestarting worker after current request.
[2018-12-20 14:40:19 +0000] [154] [INFO] Worker exiting (pid: 154)
[2018-12-20 14:40:20 +0000] [158] [INFO] Booting worker with pid: 158
Thanks for your suggestions!
Rob
-
- Posts: 17
- Joined: Wed Oct 03, 2018 3:54 pm
Re: Document Content Empty
Did some more digging on a new install of Mayan. Test files used to check parsing were docx and txt files.
-Created new Debian 9 VM with root access.
-Loaded docker and docker-compose under root user
-Installed Mayan-EDMS via docker-compose.yml available through gitlab (4 containers).
This process gave the same issues, Mayan created all of the subfolders EXCEPT documents_cache, but after manual folder creation it was able to preview and OCR the documents but still no document "Content" parsed.
Removed all volumes, images, containers, etc
Then I tried following the 2 container instructions from the documentation:
To be sure there weren't any permission issues I set the docker volume folder to have 777 permissions. This gave the same result with document_storage being created to store the first file on upload. Documents_cache folder was not created so no OCR and no previews on startup but after manually creating the folder previews and OCR worked. Still no parsed document content.
Update:
I checked the log files of DB container and found this with only 2 files uploaded with no errors:
-Created new Debian 9 VM with root access.
-Loaded docker and docker-compose under root user
-Installed Mayan-EDMS via docker-compose.yml available through gitlab (4 containers).
This process gave the same issues, Mayan created all of the subfolders EXCEPT documents_cache, but after manual folder creation it was able to preview and OCR the documents but still no document "Content" parsed.
Removed all volumes, images, containers, etc
Then I tried following the 2 container instructions from the documentation:
Code: Select all
Using a dedicated Docker network
Use this method to avoid having to expose PostreSQL port to the host’s network or if you have other PostgreSQL instances but still want to use the default port of 5432 for this installation.
Create the network:
docker network create mayan
Launch the PostgreSQL container with the network option and remove the port binding (-p 5432:5432):
docker run -d \
--name mayan-edms-postgres \
--network=mayan \
--restart=always \
-e POSTGRES_USER=mayan \
-e POSTGRES_DB=mayan \
-e POSTGRES_PASSWORD=mayanuserpass \
-v /docker-volumes/mayan-edms/postgres:/var/lib/postgresql/data \
-d postgres:9.5
Launch the Mayan EDMS container with the network option and change the database hostname to the PostgreSQL container name (mayan-edms-postgres) instead of the IP address of the Docker host (172.17.0.1):
docker run -d \
--name mayan-edms \
--network=mayan \
--restart=always \
-p 80:8000 \
-e MAYAN_DATABASE_ENGINE=django.db.backends.postgresql \
-e MAYAN_DATABASE_HOST=mayan-edms-postgres \
-e MAYAN_DATABASE_NAME=mayan \
-e MAYAN_DATABASE_PASSWORD=mayanuserpass \
-e MAYAN_DATABASE_USER=mayan \
-e MAYAN_DATABASE_CONN_MAX_AGE=60 \
-v /docker-volumes/mayan-edms/media:/var/lib/mayan \
mayanedms/mayanedms:latest
Update:
I checked the log files of DB container and found this with only 2 files uploaded with no errors:
Code: Select all
LOG: database system was shut down at 2018-12-20 16:26:28 UTC
LOG: MultiXact member wraparound protections are now enabled
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
ERROR: column "document__date_added" does not exist at character 29
STATEMENT: SELECT (date_trunc('month', document__date_added)) AS "d", COUNT("documents_documentversion"."id") AS "agg" FROM "documents_documentversion" INNER JOIN "documents_document" ON ("documents_documentversion"."document_id" = "documents_document"."id") WHERE "documents_document"."date_added" BETWEEN '2018-01-01T00:00:00+00:00'::timestamptz AND '2018-12-31T23:59:59.999999+00:00'::timestamptz GROUP BY (date_trunc('month', document__date_added))
ERROR: column "document_version__document__date_added" does not exist at character 29
STATEMENT: SELECT (date_trunc('month', document_version__document__date_added)) AS "d", COUNT("documents_documentpage"."id") AS "agg" FROM "documents_documentpage" INNER JOIN "documents_documentversion" ON ("documents_documentpage"."document_version_id" = "documents_documentversion"."id") INNER JOIN "documents_document" ON ("documents_documentversion"."document_id" = "documents_document"."id") WHERE "documents_document"."date_added" BETWEEN '2018-01-01T00:00:00+00:00'::timestamptz AND '2018-12-31T23:59:59.999999+00:00'::timestamptz GROUP BY (date_trunc('month', document_version__document__date_added))
Re: Document Content Empty
Thanks for finding the source of the issue and opening the tickets, it will be easier to solve these.
Tickets:
https://gitlab.com/mayan-edms/mayan-edms/issues/549
https://gitlab.com/mayan-edms/mayan-edms/issues/550
Tickets:
https://gitlab.com/mayan-edms/mayan-edms/issues/549
https://gitlab.com/mayan-edms/mayan-edms/issues/550
-
- Posts: 14
- Joined: Thu Sep 12, 2019 8:46 am
Re: Document Content Empty
Hi Rob..
Sorry to bump this thread again, mind to share how to solve the problem of empty content for non-pdf docs? i also have this problems, but for preview, OCR text, everything is good, just the content.
Docuemnt i have tried to uplaod so far : txt, docx, and xlsx... result content always return empty.
I use Mayan v.3.2.7 using docker..
Sorry to bump this thread again, mind to share how to solve the problem of empty content for non-pdf docs? i also have this problems, but for preview, OCR text, everything is good, just the content.
Docuemnt i have tried to uplaod so far : txt, docx, and xlsx... result content always return empty.
I use Mayan v.3.2.7 using docker..
Re: Document Content Empty
Hi,
not problem with bumping old topics, that's what the forum is for!
Can you share a document that exhibits the problem so that we can test it locally. Anything without confidential information. If you can trigger it using a public document for the web even better. Thanks.
not problem with bumping old topics, that's what the forum is for!

Can you share a document that exhibits the problem so that we can test it locally. Anything without confidential information. If you can trigger it using a public document for the web even better. Thanks.
-
- Posts: 14
- Joined: Thu Sep 12, 2019 8:46 am
Re: Document Content Empty
Hi, sorry for my delay responding the message, just came back from my duty.
Unfortunately, i just removed the installation of mayan (docker) and now still trying to reinstall Mayan using direct deployment.
But, what i remember for the document i uploaded was just a simple new created ms word document with one line of random words there. then save it as docx extension.
When i upload it to Mayan, the OCR works well, but not with the document parsing.
Any advice for this?
Thank you.
Unfortunately, i just removed the installation of mayan (docker) and now still trying to reinstall Mayan using direct deployment.
But, what i remember for the document i uploaded was just a simple new created ms word document with one line of random words there. then save it as docx extension.
When i upload it to Mayan, the OCR works well, but not with the document parsing.
Any advice for this?
Thank you.
-
- Posts: 14
- Joined: Thu Sep 12, 2019 8:46 am
Re: Document Content Empty
Hi Rosario,
Just finished deploying this great app using direct deployment.
I tried again to test the parse function, but still couldn't find the answer, it has empty content
These are samples files i put to Mayan :
https://1drv.ms/u/s!ApaK9u60Bn-xhdlM8LN ... w?e=8FAoi4
Also i found also the error from postgresql :
https://prnt.sc/pht3ub
Thank you..
Just finished deploying this great app using direct deployment.
I tried again to test the parse function, but still couldn't find the answer, it has empty content
These are samples files i put to Mayan :
https://1drv.ms/u/s!ApaK9u60Bn-xhdlM8LN ... w?e=8FAoi4
Also i found also the error from postgresql :
https://prnt.sc/pht3ub
Thank you..