Automatically Move Scanned Documents to Watch Folder

Community contributed guides or tutorials for multiple topics like installations for other operating systems or platforms, monitoring, log aggregation, etc.
Post Reply
m42e
Posts: 6
Joined: Fri Oct 11, 2019 8:21 am

Automatically Move Scanned Documents to Watch Folder

Post by m42e »

Mod Note: The project below will automatically move documents from your scanners output folder to the Mayan Watch Folder you desire when those documents are ready for processing. It has not been tested by the Mayan team and is a third party project so please ensure any issues are logged directly with m42e.
It is not something we expect everyone to require as it exists to work around known issues with certain combinations of scanners, operating systems and shared folder technologies.
You will know if you're experiencing this issue because Mayan will try to process a document before it has finished scanning, resulting in a failed upload or corrupted document in Mayan.
This happens because some models of scanner do not implement file locks correctly, or because the operating system and shared storage combination used does not support the type of file locking Mayan's upstream projects depend on. As a result, Mayan isn't aware the file is currently being written to and tries to process too soon.
The issue is not universal and is only experienced with some scanners on some operating systems and some file sharing technologies.

We are working on a permanent solution.

-----------
I recently discovered some issues using a document scanner which is capable of uploading documents directly to a SFTP share.
This share is watched by Mayan, but sometimes the documents are invalid when imported.
I discovered this happens, if mayan tries to index the file, while the upload/scan has not been completed yet.

I decided to solve this in a general way to be able to use it for same issues with other tools as well.

It just watches a folder (using notify) and moves the file, as soon it is completed (no more changes for a certain amount of time) to the mayan input folder.

I'm happy to share it with you: https://github.com/m42e/docker-moveoncomplete

Hope this is useful for one or another.

Best
m42e
Last edited by rssfed23 on Tue Dec 31, 2019 1:09 pm, edited 1 time in total.

User avatar
rssfed23
Moderator
Moderator
Posts: 185
Joined: Mon Oct 14, 2019 1:18 pm
Location: United Kingdom
Contact:

Re: Upload

Post by rssfed23 »

Thanks for sharing!

I imagine this is caused by the scanner writing directly to the file location before the entire scan is completed and then like you say Mayan pulls in an incomplete (buggy) version. There is an open feature request to look into handling this: https://gitlab.com/mayan-edms/mayan-edms/issues/456

In the interim I'm sure your code will be beneficial to many. I hope you don't mind but I'll rename the title to help people find it more easily. I'll also add something into the troubleshooting section that links to the script.
Please don't PM for general support; start a new thread with your issue instead.

pleblancq
Posts: 15
Joined: Sat Oct 26, 2019 2:09 pm

Re: A docker container that moves watch folder items automatically when ready for processing by Mayan

Post by pleblancq »

I don't know if its the same issue I had on my standalone macOS version, but my Mayan EDMS watch directories (via a SMB share) were experiencing the same problem. With a 10 sec watch time, if the scanned file wasn't fully uploaded from the printer to the smb share, the lock was somehow released and Mayan was trying to move an incomplete file.

I changed watch_folder_sources.py lockf function to flock instead at line 63 (v3.3.7) and never had issues after.

User avatar
rssfed23
Moderator
Moderator
Posts: 185
Joined: Mon Oct 14, 2019 1:18 pm
Location: United Kingdom
Contact:

Re: A docker container that moves watch folder items automatically when ready for processing by Mayan

Post by rssfed23 »

pleblancq wrote:
Wed Jan 01, 2020 2:09 pm
I changed watch_folder_sources.py lockf function to flock instead at line 63 (v3.3.7) and never had issues after.
There's an interesting discussion on lockf vs flock over at https://code.djangoproject.com/ticket/9400
The project we use went with lockf because it's more portable than flock (works on NFS and other shared filesystems). Of course we inherit whatever the upstream recommendation says to use.

There's also other variables - the issue itself could be caused by the specific scanner. There's so many scanner implementations of samba/nfs it's possible if we changed to flock then we break the majority of users. It's a tough one! Feel free to log a github issue (or let me know if you'd rather I did) though for the dev team to take a look at as if we're able to prove flock works better for most of our users over lockf then it's worth them looking into
Please don't PM for general support; start a new thread with your issue instead.

pleblancq
Posts: 15
Joined: Sat Oct 26, 2019 2:09 pm

Re: A docker container that moves watch folder items automatically when ready for processing by Mayan

Post by pleblancq »

You can log it if you want. I have my own update procedure where I change lockf to flock after each upgrade. It seemed an issue similar to what I experienced.

For the NFS share maybe the dev team could determine if its a NFS directory with the stat function and use lockf and otherwise flock.

Something like this but in python:
https://stackoverflow.com/questions/460 ... hellscript

Happy New Year to the Mayan Team

User avatar
rssfed23
Moderator
Moderator
Posts: 185
Joined: Mon Oct 14, 2019 1:18 pm
Location: United Kingdom
Contact:

Re: A docker container that moves watch folder items automatically when ready for processing by Mayan

Post by rssfed23 »

Logged under https://gitlab.com/mayan-edms/mayan-edms/issues/456

But the more I think about it the more I think if we had to choose between switching lock backends or implementing a workaround my mind drifts towards the latter. Not only because of the NFS compatibility and drifting away from upstream but also because if the scanner sending the document isn't doing locks correctly we may end up fixing it for some users like yourselves in this thread but breaking it for others.
Whichever lock backend we go with we'll always have some users that experience issues which is why I quite like what's proposed in
https://gitlab.com/mayan-edms/mayan-edms/issues/456 as if done right the end result should fix it for 99% of users (and remove the need for the container workaround in this thread)

But we'll see what the rest of the dev team think about it. Thanks for contributing your workaround Pleblancq feel free to comment on the issue logged and have a great 2020 yourself :)
Please don't PM for general support; start a new thread with your issue instead.

pleblancq
Posts: 15
Joined: Sat Oct 26, 2019 2:09 pm

Re: A docker container that moves watch folder items automatically when ready for processing by Mayan

Post by pleblancq »

One solution could be a config setting to use flock instead of lockf.

So if doesnt work, there's a FAQ that suggest to try to use flock.

I like being able to use Mayan EDMS natively on my macOS without using docker. I would be sad if something is too linux specific that break the native OS installation.

Thanks!

Post Reply