How to put text from OCR recognition in metadata field via workflow

Hello everyone,

first of all many thank for this project and the quite active community! Mayan seems to be every powerful DMS with losts of space for customizations.

Anyway I startet implementing Manay using Docker on my Synology NAS and configured a few cabinets, doctypes and a workflow to sort incoming documents.
What I am missing form my previous DMS called “paperless” is to automatically recognize/assign the correspondent and the doctype based on the OCR content.

I tried to do this using a workflow but I’m failing at the point, where I need to select a text (matching certain criteria, i.e find a Adress or Name in the document) in the OCR content and then but this text into a metadata field.
(Same logic should be applied to get the adress, billing amount, due date etc.)

Has somebody implemented a similar feature/workflow and help me out with that?
(I’m new to the django templating language and have a little struggle to find out wich objects I can use in mayan to select documet-, OCR. or Workflow information to use in the themplate filters. Is there a documentation I didn’t find yet?)

Many thanks in advance for your replies and have a wonderful weekend!

Best Regards

Some feedback from anyone?
Would be nice if ab could help me.

Thanks and happy easter everyone!

Extracting specific data like addresses is best done by building a custom app because the format of these kinds of attributes is very strict and require flexible parsing.

For other times of data that is uses a more regular format, you can use a regular expression as described in the following topic.

You can then use the result of the template as the value for automatic indexing or assign it to a metadata via a workflow action depending on your end goal.

1 Like

Hi Joko,

did you achieve it ? because I would like to do the same as you and I can’t…

Hi Jake,

I havent had the time to configure Mayan further but I will do that soon.
I will update this post with (hopefully) my results.

Planning on building it as a workflow with regex. If necessary for my usecase I try to build the custom app with python. But if possible I’d like to stick to the standards.