How to put text from OCR recognition in metadata field via workflow

Joko · March 25, 2023, 9:53am

Hello everyone,

first of all many thank for this project and the quite active community! Mayan seems to be every powerful DMS with losts of space for customizations.

Anyway I startet implementing Manay using Docker on my Synology NAS and configured a few cabinets, doctypes and a workflow to sort incoming documents.
What I am missing form my previous DMS called “paperless” is to automatically recognize/assign the correspondent and the doctype based on the OCR content.

I tried to do this using a workflow but I’m failing at the point, where I need to select a text (matching certain criteria, i.e find a Adress or Name in the document) in the OCR content and then but this text into a metadata field.
(Same logic should be applied to get the adress, billing amount, due date etc.)

Has somebody implemented a similar feature/workflow and help me out with that?
(I’m new to the django templating language and have a little struggle to find out wich objects I can use in mayan to select documet-, OCR. or Workflow information to use in the themplate filters. Is there a documentation I didn’t find yet?)

Many thanks in advance for your replies and have a wonderful weekend!

Best Regards
Joko

Joko · April 6, 2023, 6:35pm

Some feedback from anyone?
Would be nice if ab could help me.

Thanks and happy easter everyone!

roberto.rosario · April 6, 2023, 6:46pm

Extracting specific data like addresses is best done by building a custom app because the format of these kinds of attributes is very strict and require flexible parsing.

For other times of data that is uses a more regular format, you can use a regular expression as described in the following topic.

You can then use the result of the template as the value for automatic indexing or assign it to a metadata via a workflow action depending on your end goal.

jake · May 17, 2023, 8:52pm

Hi Joko,

did you achieve it ? because I would like to do the same as you and I can’t…

Joko · May 26, 2023, 7:57am

Hi Jake,

I havent had the time to configure Mayan further but I will do that soon.
I will update this post with (hopefully) my results.

Planning on building it as a workflow with regex. If necessary for my usecase I try to build the custom app with python. But if possible I’d like to stick to the standards.

Best
Joko

oduquenoy · November 22, 2023, 9:08pm

Hi
Can you give us a simple example of assigning a value to a metadata.
I try something like

set workflow_instance.document.metadata_value_of.my_metadata= “something”

But metadata value don’t change

oduquenoy · December 14, 2023, 3:32pm

Finally I found Didn’t see all option in the dropdown

system · December 15, 2023, 3:32am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.