I have a number of documents that routinely come to me with the date in yyyy-mm-dd format in the file name, and thus, once imported, in the document label. I created a workflow which conducts a regex search for this date and adds it as metadata in a receipt_date metadata field. That part seems to work great. The problem comes when I try to index from this field.
In the sandbox, {{ document.metadata_value_of.receipt_date }} returns the expected yyyy-mm-dd with no other characters. But, in an index, {{ document.metadata_value_of.receipt_date|slice:"0:4" }} returns only 20, as in the first two characters of the date. If I manually click edit metadata on a file, change nothing, but select the check mark for this metadata field, and click edit, it works as expected in the indexes, with the above returning the four digit date.
Any ideas? I have about 3000 documents formatted this way, so manually opening each and “editing” the metadata is less than ideal.
[SOLVED] Weird date issue
[SOLVED] Weird date issue
Last edited by Tim on Wed Feb 03, 2021 8:50 pm, edited 1 time in total.
Re: Weird date issue
Ok. I figured it out.
My Workflow action adding the metadata was as follows:
What was happening is that this essentially adds a hidden \r\n to the beginning of the code—picking up the return between the regex.search and the m.0. I’m not sure why— maybe I’m too new and that’s expected.
Causes the date to be filled in correctly, and solves the issue
Also, I know the regex is probably overly complicated— in that it will pick up dates clearly not intended for this application. I spent a few days focusing on regex-noobness as the issue, utilizing as many different permutations of the yyyy-mm-dd search format as possible. This doesn’t seem broken, so I’m not going to fix it.
My Workflow action adding the metadata was as follows:
Code: Select all
{% regex_match "([0-9]{4}[-/]?((0[13-9]|1[012])[-/]?(0[1-9]|[12][0-9]|30)|(0[13578]|1[02])[-/]?31|02[-/]?(0[1-9]|1[0-9]|2[0-8]))|([0-9]{2}(([2468][048]|[02468][48])|[13579][26])|([13579][26]|[02468][048]|0[0-9]|1[0-6])00)[-/]?02[-/]?29)" document.label as m %}
{{ m.0 }}
Code: Select all
{% regex_match "([0-9]{4}[-/]?((0[13-9]|1[012])[-/]?(0[1-9]|[12][0-9]|30)|(0[13578]|1[02])[-/]?31|02[-/]?(0[1-9]|1[0-9]|2[0-8]))|([0-9]{2}(([2468][048]|[02468][48])|[13579][26])|([13579][26]|[02468][048]|0[0-9]|1[0-6])00)[-/]?02[-/]?29)" document.label as m %}{{ m.0 }}
Also, I know the regex is probably overly complicated— in that it will pick up dates clearly not intended for this application. I spent a few days focusing on regex-noobness as the issue, utilizing as many different permutations of the yyyy-mm-dd search format as possible. This doesn’t seem broken, so I’m not going to fix it.