first of all many thank for this project and the quite active community! Mayan seems to be every powerful DMS with losts of space for customizations.
Anyway I startet implementing Manay using Docker on my Synology NAS and configured a few cabinets, doctypes and a workflow to sort incoming documents.
What I am missing form my previous DMS called “paperless” is to automatically recognize/assign the correspondent and the doctype based on the OCR content.
I tried to do this using a workflow but I’m failing at the point, where I need to select a text (matching certain criteria, i.e find a Adress or Name in the document) in the OCR content and then but this text into a metadata field.
(Same logic should be applied to get the adress, billing amount, due date etc.)
Has somebody implemented a similar feature/workflow and help me out with that?
(I’m new to the django templating language and have a little struggle to find out wich objects I can use in mayan to select documet-, OCR. or Workflow information to use in the themplate filters. Is there a documentation I didn’t find yet?)
Many thanks in advance for your replies and have a wonderful weekend!
Extracting specific data like addresses is best done by building a custom app because the format of these kinds of attributes is very strict and require flexible parsing.
For other times of data that is uses a more regular format, you can use a regular expression as described in the following topic.
You can then use the result of the template as the value for automatic indexing or assign it to a metadata via a workflow action depending on your end goal.
I havent had the time to configure Mayan further but I will do that soon.
I will update this post with (hopefully) my results.
Planning on building it as a workflow with regex. If necessary for my usecase I try to build the custom app with python. But if possible I’d like to stick to the standards.