Regex and template scripts assambled over the years

Hi all,

Here a copy of all regex-es and other Django template scripts I found useful
(source: old forum, documentation and my own)
The scripts might not work but it will give you a head start.

Enjoy

=====================
zzz.yyy.all - an array of “all” of a variable content
zzz.yyy.count - the number of a variable item
zzz.yyy.ocr_content - an array of OCR’d content
zzz.yyy.label - the ‘name’ of a variable (such as tag name or document_type name

document.yyy.* - document section

document.date_added - date that the document was added to Mayan-EDMS

document.tags.* - document tags variable
document.tags.all - an array of all tags for each document
document.tags.count - how many tags does each document have

document.label.* - document label variable
document.label - [thanks @Shoop_nl]
document.label.count - how many labels does each document contain
document.label.all - an array of all labels for each document

document.latest_version.* - selection of the latest version of a document
document.latest_version.ocr_content - this is an array of the OCR content, from which you can extract words and use for comparisons, etc in functions

document.document_type.* - document type variables
document.document_type.label - allows you to read what the name of the document type is for a given document

document.metadata_value_of.*
document.metadata_value_of.[metadata_key] - allows you to test for the existence of this document metadata

document.cabinets.* - reading cabinet information for a given document
document.cabinets.count - how many cabinets is each document in
document.cabinets.all - an array of all cabinets [thanks @wess]

tag.yyyy.* - tag section
tag.label - the tag name

{{ document.document_type }}

{% if document.document_type.label == “Invoice” or document.document_type.label == “Letter” %}
Correspondence
{% else %}
{{ document.document_type }}
{% endif %}

{% if document.metadata_value_of.invoice_number.0 == “A” %}
Accounting
{% elif document.metadata_value_of.invoice_number.0 == “H” %}
Human Resources
{% endif %}

{{ document.metadata_value_of.date_issued|slice:“0:4” }} → year

{{ document.metadata_value_of.date_issued|slice:“5:7” }} → month

{{ document.metadata_value_of.date_issued|slice:“8:10” }} → day

{% if “quarterly report” in document.version_active.ocr_content|join:" “|lower %}Quarterly reports{% endif %}
{% if “quarterly report” in document.version_active.content|join:” "|lower %}Quarterly reports{% endif %}

{% if document.cabinets.count == 0 %}No Cabinets{% endif %}
{% if document.tags.count == 0 %}No Tags{% endif %}

{% for tag in document.tags.all %}
{% if tag.label == “Taxes” %}
{% if document.metadata_value_of.tax_year|length_is:“4” %}
{{ document.metadata_value_of.tax_year }}
{% else %}
{{ document.date_added|date:“Y” }}
{% endif %}
{% endif %}
{% endfor %}

{% method workflow_instance.document.tags “filter” label=“new_doc” as tag_result %}
{% if tag_result %}
True
{% endif %}

*** Find names in ocr ***

{% if “Kevin Pawsey” in document.latest_version.ocr_content|join:" " %}Kevin
{% elif “Kevin B Pawsey” in document.latest_version.ocr_content|join:" " %}Kevin
{% elif “K Pawsey” in document.latest_version.ocr_content|join:" " %}Kevin
{% endif %}

{% with ocr=document.latest_version.ocr_content|join:" "|lower %}
{% regex_search “j[ao]hn( +(w|vv)(.|[uv]rt)?)?( +sm[li1]th)?” ocr as result %}
{% if result %}
{{ result.0 }}
{% endif %}
{% endwith %}

{% for tag in document.tags.all %}
{{tag.label}}
{% endfor %}

{% if “payslip” in document.label|lower %}
Payslips
{% endif %}

{% if document.cabinets.count == 0 %}
None
{% else %}
{% for cab in document.cabinets.all %}
{{ cab.label }}
{% endfor %}
{% endif %}

{% if not “NoDueDateNeeded” in document.tags.all|join:" " %}
{% if not document.metadata_value_of.date_due %}
No Due Date
{% endif %}
{% endif %}

{{ document.file_metadata_value_of.exiftool__FileType }}

{% if document.metadata_value_of.TransactionDate|slice:“5:7” in ‘07,08,09,10,11,12’ %}
FY{{ document.metadata_value_of.TransactionDate|slice:“0:4”|add:“1” }}
{% elif document.metadata_value_of.TransactionDate|slice:“5:7” in ‘01,02,03,04,05,06’ %}
FY{{ document.metadata_value_of.TransactionDate|slice:“0:4”|add:“0”}}
{% else %}
No Tr Date
{% endif %}

{% method workflow_instance.document.tags “filter” label=“new_doc” as tag_result %}{% if tag_result %}True{% endif %}

{% regex_sub “\s+” " " document.metadata_value_of.meta_header as tmp_header %}
{% with tmp_header.strip|split:" " as header_splitted %}
{% regex_match “[0-9]” header_splitted.0 as starts_with_number %}
{% if starts_with_number %}
{{ header_splitted.0 }}
{% endif %}
{% endwith %}

{% for tag in document.tags.all%} {{tag.label}} {% endfor %}

{% if document.cabinets.count == 0 %}
None
{% else %}
{% for cab in document.cabinets.all %}
{{ cab.label }}
{% endfor %}
{% endif %}

{# find new species #}
{% with ocr=document.ocr_content|join:" " %}
{% regex_search “.*\s(sp|nov).\s(n|spec).” ocr as result %}
{{ result.0 }}
{% endwith %}

{# get keywords out of article: Keywords: one, two - three#}
{% regex_sub “([a-z])-\n([a-z])” “\1\2” document.ocr_content|join:" " as dehyphened %}{% regex_search “Key\s?words:?\s(([\w-\s()]+)+\s?[-;,\n]?\n?)+\n” dehyphened as keywords_sentences %}{% regex_sub “\s-\s?\n?” “,” keywords_sentences.0 as keyword_group%}{% regex_sub “Key\s?words:?\s” “” keyword_group as keyword_group_without_keyword%}{% with keyword_group_without_keyword|split:", " as keywords %}{% for keyword in keywords %}{{ keyword }}
{% endfor %}{% endwith %}

{# find all new species:" sp.n ",
" ps. n. ",
" sp. nov. ",
" n. sp. ", "
" nov. spec. ",
" new species "
#}
{% for ocr in document.ocr_content %}{% regex_findall “(\w+\s\w+)\s(sp|n(ov|ew)?).\s(n(ov)|sp(ec|ecies)?).” ocr as species %}{% for specie in species%}{{ specie.0 }}, {% endfor %}{% endfor %}

{% spaceless %} {% set document.ocr_content|join:“” as ocr_text %} {% regex_search “\s(n(ew|.)\s?sp(ecies|\.))\s” ocr_text as matches %} {{ matches.0 }} {% endspaceless %}

=====================

5 Likes

Thanks for sharing! Very helpful since it is not easy to find templates for such cases :+1: :+1: :+1:

1 Like

Wow! This is great! Thank you.