Novice Question (seeking Template)

Hi all,
New here - thanks for making the most stable open source EDMS I could find in the world! I subscribed to a year and will try to learn as much from the docs but sadly I’m not much of a programmer or IT professional, though I have dabbled heavily and I do host servers for my practice.

I need a bunch of pdf documents to be form field searchable across text fields to find keywords. I think whoosh or whoop is the one which is easier than elasticsearch (saw on this forum) so I’ll go with that. Once I get it on Docker/Docker compose, are there any templates or starting points for just having a stable searchable text database?

Theoretically, my needs (other than perhaps design UX) could probably be met in an excel spreadsheet with vLookup table system of some type - but I’m here because a DMS is more scaleable and Mayan is robust/reliable. We don’t know how many pdfs we will eventually want to load in there (maybe GB one day) or if we’ll upgrade to something like an NLP search one day so I’m hoping there is an easy way to get started with Mayan.

Please let me know if any guide exists that can make this text-field searchable setup complete by a novice in 1 day? Where should I start (other than installing it on docker)? Many thanks, kind friends of Mayan EDMS!

If your PDF’s are text and not images they will already be searchable.

If they are images, Mayan has a built-in OCR engine. You just need to make sure it’s set to run in your Document type settings.

If your documents don’t OCR well, there are other third party options like Mindee that you can use the API to connect to and give you clean data.

1 Like

I need a bunch of pdf documents to be form field searchable across text fields to find keywords.

Mayan makes a lot of the document properties searchable. Things the embedded text and the OCR (if the document has no embedded text).

However Mayan has no knowledge about the context of the text. To perform keyword search you need to specify the keywords either when searching, or as an index template to classify the documents by the specific keywords of interest.

I think whoosh or whoop is the one which is easier than elasticsearch (saw on this forum) so I’ll go with that.

Whoosh is the default so there is nothing to configure out of the box. ElasticSearch is a requirement when the repository grows, when the concurrency requirements increase (multiple users), or to scale (Whoosh is file based so lock contention is common), more complete search logic is required.

How to enable Mayan EDMS to use Elasticsearch

if we’ll upgrade to something like an NLP search one day so I’m hoping there is an easy way to get started with Mayan.

That is the advantage of a DMS like Mayan, once the documents are in, other functionality can be added. We have been running Machine Learning and AI classifiers on Mayan for years now. Hence our patent on the matter:

Patent citation by Bank of America Corp.

Please let me know if any guide exists that can make this text-field searchable setup complete by a novice in 1 day? Where should I start (other than installing it on docker)? Many thanks, kind friends of Mayan EDMS!

Hope this helps.