Workflows - Access OCR content for condition for adding tags

Questions, comments, discussions. Over time certain topics might be moved to their own category.
Post Reply
ZiZoTeX
Posts: 4
Joined: Sun May 03, 2020 3:24 pm

Workflows - Access OCR content for condition for adding tags

Post by ZiZoTeX »

Hello there,

I have trouble accessing the content of a Document to determine if a add a tag.
I tried it with the document.latest_version.content variable, but I guess this is no accessible from a workflow.
I have a workflow which enters a state when OCRing is finished. Then I would like to look for certain words an add a tag if they are in the content.
Unfortunatley I have no clue how I can access the OCR Content from within the workflow state action.


Any help is greatly appreciated.

Best regards
ZiZoTeX
Posts: 4
Joined: Sun May 03, 2020 3:24 pm

Re: Workflows - Access OCR content for condition for adding tags

Post by ZiZoTeX »

just realized that I posted the question in the wrong forum. Sorry bout that!
ZiZoTeX
Posts: 4
Joined: Sun May 03, 2020 3:24 pm

Re: Workflows - Access OCR content for condition for adding tags

Post by ZiZoTeX »

And now I feel pretty stupid.
Of course I can access everything from workflow_instance.document.latest_version.ocr_content.

I didn't realize I was not using the workflow_instance entrypoint there.
lonestar
Posts: 1
Joined: Fri Jun 12, 2020 9:48 pm

Re: Workflows - Access OCR content for condition for adding tags

Post by lonestar »

Hello,
I'm new here and I'm starting to use Mayan EDMS just now so please forgive me if I'm asking something that sounds obvious, but I am trying to do this same thing: a workflow for adding tags to documents depending on the presence of certain words in their OCR content.

I really don't understand how the action condition should be written :(

I've tried things like:

Code: Select all

{% if "my word" in workflow_instance.document.latest_version.ocr_content %}True{% endif %}
and many variations of this but I don't seem to find a way. The action is not performed.

Could anyone please give me a hint of what I'm doing wrong?

Thanks
Luigi
Gubert
Posts: 2
Joined: Wed Jul 29, 2020 1:50 pm

Re: Workflows - Access OCR content for condition for adding tags

Post by Gubert »

lonestar wrote: Fri Jun 12, 2020 10:06 pm Hello,
I'm new here and I'm starting to use Mayan EDMS just now so please forgive me if I'm asking something that sounds obvious, but I am trying to do this same thing: a workflow for adding tags to documents depending on the presence of certain words in their OCR content.

I really don't understand how the action condition should be written :(

I've tried things like:

Code: Select all

{% if "my word" in workflow_instance.document.latest_version.ocr_content %}True{% endif %}
and many variations of this but I don't seem to find a way. The action is not performed.

Could anyone please give me a hint of what I'm doing wrong?

Thanks
Luigi
Stuck here too. Can anyone help please?
Tried things like:

Code: Select all

{% if "try" in workflow_instance.document.latest_version.ocr_content %}
{% endif %}

Code: Select all

{% if "try" in workflow_instance.document.latest_version.ocr_content %}True{% endif %}
Without condition the workflow works fine. The documentation is really incomplete. No examples for that.
fgdutoit
Posts: 8
Joined: Sat May 23, 2020 9:23 am

Re: Workflows - Access OCR content for condition for adding tags

Post by fgdutoit »

I found that aplying the Django filter join to workflow_instance.document.latest_version.ocr_content worked for me

{% if "my word" in workflow_instance.document.latest_version.ocr_content|join:" " %}True{% endif %}

P.S:The Template Sandbox (in document view) really help with figuring out templates
User avatar
michael
Developer
Developer
Posts: 48
Joined: Sun Apr 19, 2020 6:21 am

Re: Workflows - Access OCR content for condition for adding tags

Post by michael »

Expanding on fgdutoit's solution:
fgdutoit wrote: Fri Aug 14, 2020 4:27 am I found that aplying the Django filter join to workflow_instance.document.latest_version.ocr_content worked for me

{% if "my word" in workflow_instance.document.latest_version.ocr_content|join:" " %}True{% endif %}

P.S:The Template Sandbox (in document view) really help with figuring out templates
Since "ocr_content" is a generator that returns the OCR content of each page, you can save a bit of memory and get a potential speed boost by using:

{% for page_ocr in workflow_instance.document.latest_version.ocr_content %}
{% if "my word" in page_ocr %}True{% endif %}
{% endfor %}

This iterates over the OCR content of each page instead of joining all the OCR content in a single string, which can be more efficient for documents with a large number of pages.
Post Reply