I am having issues with watch folders. I created a watch folder and when I perform a test, it adds a single document fine. The folder is set to scan every 600 seconds and there are over 199 PDF files in the published folder but they do not get added.
What can be causing the problem?
I verified the entry in docker-compose.yml and is properly created.
What can I do to troubleshoot this issue more in-depth?
This is the designed behavior. The task is set to process only one file to make the task as atomic as possible.
If the watch folder in combination with the PDF sizes processes the files in your deployment in a fast manner you can lower the schedule time to 20 to 30 seconds.
There is no corruption scenario with low schedule times. The most common scenario is duplicated PDF as two task can try to upload the same file if the first task is taking longer than the schedule time.
We are considering adding the option of a “batch size” to allow users to specify how many files to process for each invocation. The downside is that using a number that is too large will create a task that will run in the background for many minutes and could be identified as a hanged task and killed.
Another improvement we are discussing is “locking” the file to a specific task, this way many parallel tasks can be launched, each processing a single file for fast ingestion of folders.
There are just two improvements being considered for future versions.
I understand perfectly the behavior regarding the test. That works as expected.
What;s not is that it is not processing any other files ever.
If I press test, it grabs a new one from the folder but then nothing else happens every 5 minutes.
I have well over 100 PDF documents there and none of which have been imported automatically.
Get at least version 4.5.4, before that there were some issues with the watched folders.
Check the error log to see if there is an issue being reported. A common issue is filesystem permissions.
Also do a:
docker compose logs -f
To see if there are any error being reported at the console level.