First time user, starting tips?

I am a brand new user and I’m going to spend this weekend getting a first set of documents visible for my users. I’ve read most of the 27 Articles in the Knowledge Base and some of the tutorials. I don’t see anything specific for those of us who’ve never before been here.

I have a Proxmox hosting environment. I installed Ubuntu in a VM, installed Docker, and I saw the Portainer deploy for Mayan and used that to get it running. I have experience with Open Semantic Search to the level of building all the components by hand. It uses Celery, RabbitMQ, it’s also a Django based app, so everything with Mayan looks really familiar to me. I have done other work involving Elasticsearch so I’m headed towards using that long term. I write some Python and I use Pycharm as a dev environment, but this is mostly glue for the various systems I use, I don’t self-describe as a programmer.

My documents are all PDF format and there are 670,000 at the moment, but once I get Mayan running that will increase steadily. These are all records requests I need to make accessible to a handful of researchers. They are stored in a two or three level hierarchy, basically an AREA of interest, a GROUP, and then in some cases a SUBGROUP. Folder structure looks like this.

/OSS/area1/group1
/OSS/area2/group2/subgroup2

When there are new documents they require a vigorous scrubbing before they can be used. They will arrive in large batches that would either be added to an existing folder, or they would get their own areaX/groupY desgination.

I think what I want here is the watch folder source rather than a staging folder. Can I just do what I’d do with OSS here? I would normally mount the document file system on Proxmox using NFS. It would be my strong preference to not need to have the documents in a native file system of the VM itself.

New Knowledge base article for new users: Getting started