Following the "Exploring Mayan EDMS" book I was able to create a email source. Now I need to extract information from sender email address and from the subject of emails and associate these information to two metadata.
Some one has experience about that?
Thanks.
Extract metadata from email
Re: Extract metadata from email
- First you create a document type, e.g. mail
- Then you set up two meta data types: subject and sender and associate them with the document type you created.
- When you set up the mail-source you choose the document type
- And in the two drop downs metadata subject and metadata sender you select the two meta data types you created.
Re: Extract metadata from email
Manually I can assign the subject and sender metadata has you described. But I need a way/process, that automatically, extract these metadata from mail and, and assign these values to the downloaded email and attaches.
Thanks
Thanks
Re: Extract metadata from email
I would like to 2nd this argument.
How can the header data be parsed for the informations contained there?
Also it seems, html body needs to be handled seperatly. In default it is just a "dead document".
Another problem I do have: the move into... after parsing seem not to work for me. The mails just got deleted.
How can the header data be parsed for the informations contained there?
Also it seems, html body needs to be handled seperatly. In default it is just a "dead document".
Another problem I do have: the move into... after parsing seem not to work for me. The mails just got deleted.
Re: Extract metadata from email
I configured this with Workflow and some obscure templates to sanitize input.
- Mail Source assign specific Document Type.
- Workflow applied to this Document Type.
- Workflow has no reset transition -> executed only once.
- Mail header looks like "DocNumber DocDate DocHeader" and being temporary saved by Mail Source in metadata field called meta_header. For ex.: "01-08-320 01.09.2020 Hello There"
- In Workflow State Actions temporary value of meta_header splitted by space and then parts of it used to fill in corresponding metadata fields: DocNumber => list[0] => meta_number, DocDate => list[1] => meta_date. Last action extracts DocHeader => list[2:] and updates meta_header with final value.
States Transitions (Triggers: Document version parsing finished) No Actions for 0% state Actions for 100 %state. It's important to have numbers in front of names of actions, because they execute by name, and i want to add metafields first, and to update meta_header field last. Action 0 - Add metadata fields meta_number and meta_date Action 1 - Edit metadata field meta_number. Value of this field must be DocNumber.
Code: Select all
{% regex_sub "\s+" " " document.metadata_value_of.meta_header as tmp_header %}{% with tmp_header.strip|split:" " as header_splitted %}{% regex_match "[0-9]" header_splitted.0 as starts_with_number %}{% if starts_with_number %}{{ header_splitted.0 }}{% endif %}{% endwith %}
- regex_sub used for replace multiple whitespaces "\s+" to single space and result saved as tmp_header
- .strip used for strip spaces from beginning and end of tmp_header, then tmp_header splitted by space and result saved as header_splitted
- regex_match checks that item with index 0 of header_splitted starts with any number and result saved as starts_with_number
- if starts_with_number is True, then write value of header_splitted.0 to metadata field. Else do nothing.
Code: Select all
{% regex_sub "\s+" " " document.metadata_value_of.meta_header as tmp_header %}{% with tmp_header.strip|split:" " as header_splitted %}{% regex_match "[0-9]" header_splitted.1 as starts_with_number %}{% if starts_with_number %}{{ header_splitted.1 }}{% endif %}{% endwith %}
Code: Select all
{% regex_sub "\s+" " " document.metadata_value_of.meta_header as tmp_header %}{% with tmp_header.strip|split:" " as header_splitted %}{% regex_match "[0-9]" header_splitted.0 as starts_with_number_0 %}{% regex_match "[0-9]" header_splitted.1 as starts_with_number_1 %}{% if starts_with_number_0 and starts_with_number_1 %}{{ header_splitted | slice:"2:" | join:" " }}{% else %}{{ tmp_header.strip }}{% endif %}{% endwith %}
Re: Extract metadata from email
This is an impressive workflow setup @spirkaa. Thanks for sharing it!