Mayan EDMS index reference post

Questions, comments, discussions. Over time certain topics might be moved to their own category.
KevinPawsey
Posts: 85
Joined: Wed Aug 22, 2018 2:52 pm

Mayan EDMS index reference post

Post by KevinPawsey » Tue Dec 18, 2018 2:02 pm

Hi,

I thought that I would start a thread on the indexing system, as it is so powerful, and yet there seems to be very little information on how to create indexes. I figured that if we pooled out resources with what indexes we currently have, and what variables we are using, this would be a great way of fully utilizing this great product.

What do I use Mayan-EDMS for?
Home filing... a utility bill comes through the door, it gets tagged with "utility bill", document type of 18 months, and scanned in... the rest is just magical :) I have several indexes created, one is to index based on people mentioned in the document, another index is based on the tags, and another (will be) based on utility company mentioned in the document (at the moment have an "issue" with determining the difference between a letter from the bank, or a company that has a remittance form attached to their letter that is with that same bank :/). Also, I am using it to store user manuals, magazines, recipe books/pdfs from websites... basically all those things that you usually have knocking around in your Downloads folder on your computer that you then curse and sware when you have a clear out and can't find them again :D

What do I run Mayan-EDMS on?
At the moment it is a very basic workstation (an old Dell workstation) that is a Core2Duo (told you it was old!!), 16Gb memory, 2x150Gb hard drives (RAID1 via hardware) for data storage, a single drive for OS. For the Operating System I use OpenMediaVault (http://www.openmediavault.org/), and in here I use Docker for PostGres and Mayan-EDMS (and watchtower for updates). That is it... it works well, and apart from the odd time that the Postgres container get's updates that causes the Mayan container to have a bit of a sulk, it is rock solid.

What are Indexes?
Indexes are similar to the Cabinet system, except they are dynamic, so you can set a list of "rules" so that the index can be built with each addition of a new document.

What can be indexed?
Well... just about anything, if you attach the right data (or metadata, which I haven't played with much), and even the contents of the documents once OCR'd.

Why are Indexes so powerful?
Wouldn't it be great to have a read-only folder structure of all the indexed documents that you have scanned and OCR'd in Mayan-EDMS? well... guess what, it is possible! Yes, you can get Mayan to mount an Index, and you end up with a folder structure that is the same as the index structure, meaning that you don't even need to give people access to the Mayan-EDMS interface, just share out the mounted folder and they can find the documents they need through the mounted index! I don't actually use this myself, but I can imagine that in a production environment it would be great to give people access to filed documents, and being read-only they wouldn't be able to actually make any changes.

How do you create indexes?
If you go to "System" then "Setup" in the top menu, there is an "Indexes" button.
You can create a new Index by clicking on Actions and then "Create Index".
You are then presented with a fairly basic screen, asking for:
  • Label - name of the index (the name that you see in the Indexes menu item)
  • Slug - this is how the index is referred to internally in Mayan-EDMS. This is if you are building other indexes referring to this (I believe)
  • Enabled - you can create an index if you want, and not have it enabled... maybe for testing purposes
When you click on "Submit" you are taken back to the list of Indexes.
To make the Index useful you need to do 2 things:
  • give it some document types to index
  • tell it what you want to index based on
If you click on "Document types" you can add which document types you want to include in the index (you may only want invoices indexed, if you have a document types for that... for instance. So it is worth reviewing how you use your Document types to make sure that you are indexing what you need.
once you have told it which document types, then comes the more complicated part of constructing your indexes.
Click on "Tree template" and you will be taken to the screen where you can add the templates for the indexing "folders", of which there is always a "root" and then "child" folders.
To get started you need to click on "New Child Node", which presents you with a blank box... this is where there appears to be a slight shortage of information in the community of what goes in this box.

So this is an extract of one of the indexes that I have at the moment:

Code: Select all

{% if "Kevin Pawsey" in document.latest_version.ocr_content|join:" " %}Kevin
{% elif "Kevin B Pawsey" in document.latest_version.ocr_content|join:" " %}Kevin
{% elif "K Pawsey" in document.latest_version.ocr_content|join:" " %}Kevin
{% endif %}
What this does, is looks through the documents OCR content, looks for mentions of any of the above variations on my name, and then puts the document in the index "folder" Kevin. This is the only way that I have found of doing this at the moment, it would be much more efficient if I could work out a "for ..." method, then I can have one list, one statement, and it would be more streamlined.
I then have various other members of the family in that same "Tree template", which then means that each person has their own "folder" ... what is great about this is, if more than one person is mentioned in the document, then it actually creates a folder with multiple names, meaning that, if you get a 'joint' letter come in addressed to more than one person, it will be filed under both names in one folder.

It is important to remember that the language used to construct these indexes is very finicky, so case and spacing are to be taken into consideration. So to do an "if" statement you need the following:

Code: Select all

{% if [if statement] %}
{% elif [else statement] %}
{% endif %}
the else statement is not essential, but fit my needs for multiple variations on my name in the document to be filed under the same 'folder'.
In the examples on indexing in the Mayan-EDMS documentation https://docs.mayan-edms.com/chapters/in ... mples.html there are a few variables that you can play with, but this is very limited. The document.latest_version.ocr_content is what I have found most useful, as this allows you to look inside the OCR content and extract information based on this. There may be many other variables, but what is in the aforementioned page is the only source that I have found so far.

Thanks to @Snoop_nl for pointing out that I forgot to mention the fact that you need to check the Linked Documents to see a list of documents that have been indexed. Without checking this you will not have any documents that you can see inside the indexed folders. This should be ticked on the lowest level of your indexing infrastructure. If you tick it on the higher levels there will not be an option to "drill" any further into your index, and all the indexed documents will show at that level.

To get the syntax of the if, then, else statement, I researched on the following page, which is mentioned on the Tree template page:
https://docs.djangoproject.com/en/1.11/ ... /builtins/
This is a very helpful page, but it only tells you the builtin in operators, but not the variables that are available within Mayan-EDMS.

At the moment this works well for me, just for the filing for individuals letters... but when it comes to letters from companies, I would like something a bit more "dynamic"... so all I need to do is add the new company to a "list" somewhere that then will automatically index with this new "folder".

My next hurdle is going to be my company index... I need to construct an index that looks at the first occurrence of a name in a document... I will keep you posted.

I would love to hear from people as to what they are using as an index, how it is helping them... and any other mayan variables that people are using I will try and maybe keep a running post of any input people give, so that others may use it as a reference for building indexes.

Hopefully people will find this useful to get you off the ground for creating indexes... maybe someone could make this a 'stickie' so that people can find it (if enough people say it is useful!)

Kevin

Edit: @Snoop_nl addition of Linked Documents explanation
Last edited by KevinPawsey on Mon Dec 24, 2018 9:59 am, edited 2 times in total.
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core

KevinPawsey
Posts: 85
Joined: Wed Aug 22, 2018 2:52 pm

Re: Mayan EDMS index reference post

Post by KevinPawsey » Tue Dec 18, 2018 2:08 pm

This is where I will intend to keep a list of variables and their function, as people suggest them and their use

zzz.yyy.all - an array of "all" of a variable content
zzz.yyy.count - the number of a variable item
zzz.yyy.ocr_content - an array of OCR'd content
zzz.yyy.label - the 'name' of a variable (such as tag name or document_type name

document.yyy.xxx - document section

document.date_added - date that the document was added to Mayan-EDMS

document.tags.xxx - document tags variable
document.tags.all - an array of all tags for each document
document.tags.count - how many tags does each document have

document.label.xxx - document label variable
document.label - [thanks @Shoop_nl]
document.label.count - how many labels does each document contain
document.label.all - an array of all labels for each document

document.latest_version.xxx - selection of the latest version of a document
document.latest_version.ocr_content - this is an array of the OCR content, from which you can extract words and use for comparisons, etc in functions

document.document_type.xxx - document type variables
document.document_type.label - allows you to read what the name of the document type is for a given document

document.metadata_value_of.xxx
document.metadata_value_of.[metadata_key] - allows you to test for the existence of this document metadata

document.cabinets.xxx - reading cabinet information for a given document
document.cabinets.count - how many cabinets is each document in
document.cabinets.all - an array of all cabinets [thanks @wess]

tag.yyyy.xxx - tag section
tag.label - the tag name
Last edited by KevinPawsey on Fri Feb 22, 2019 3:37 pm, edited 6 times in total.
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core

KevinPawsey
Posts: 85
Joined: Wed Aug 22, 2018 2:52 pm

Re: Mayan EDMS index reference post

Post by KevinPawsey » Tue Dec 18, 2018 2:08 pm

[reserved for future expansion]
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core

KevinPawsey
Posts: 85
Joined: Wed Aug 22, 2018 2:52 pm

Re: Mayan EDMS index reference post

Post by KevinPawsey » Tue Dec 18, 2018 2:22 pm

So there are the indexes I have at the moment:

Extracting peoples name from OCR content:
{% if "Bob Smith" in document.latest_version.ocr_content|join:" " %}Bob
{% elif "B Smith" in document.latest_version.ocr_content|join:" " %}Bob
{% elif "Mr Smith" in document.latest_version.ocr_content|join:" " %}Bob
{% endif %}
{% if "Jane Smith" in document.latest_version.ocr_content|join:" " %}Jane
{% elif "J Smith" in document.latest_version.ocr_content|join:" " %}Jane
{% elif "Mrs Smith" in document.latest_version.ocr_content|join:" " %}Jane
{% endif %}
This will give an index of
root
|
---- Bob
|
---- Jane
|
---- Bob Jane
Then anything with Bob in it goes under Bob, anything with Jane under Jane and anything mentioning both goes under "Bob Jane".

Tags in an indexed structure:

Code: Select all

{% for tag in document.tags.all %} {{tag.label}} {% endfor %}
This produces an Index of:
root
|
---- tag1
|
---- tag2
|
---- tagX

This is very useful, as if these are created as Child folders in an index, you can narrow down a document. For instance, for the above person index, you may then want to put under each person a year or year and month... this limits the amount of documents you may need to look through to get what you are looking for.
Documents in Date folders:

Code: Select all

 {{ document.date_added|date:"Y" }}
This produces an output of
root
|
---- 2018
|
---- 2017
If you then create a child under this of "Month" with the following index:

Code: Select all

{{ document.date_added|date:"m" }}
you can then get the following:
|
---- 2018
---------02
---------06
---------08
|
---- 2017
---------11
---------12
Of course, any of these can be nested in any way to get varying results... such as Year>Month>Person ... or Tag>Person or Person>Tag ... as a few ideas
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core

Shoop_nl
Posts: 3
Joined: Thu Dec 20, 2018 2:03 pm

Re: Mayan EDMS index reference post

Post by Shoop_nl » Fri Dec 21, 2018 4:35 pm

Hi Kevin,

This is a very usefull post for me, it got me up to speed with using indexes.
I am using Mayan-edms for the same reason as you do, homefiling.

For the indexes I was searching for a way to index (partialy) via a part of the filename of the document (using year and month).
Anyway this works now.

The only thing I was missing in your story is that you have to check the Link Documents field, on the lowest child node entry,
else it will not show any documents (at least that was may case).

Thanx for your info!
Running Mayan-edms on QNAP-NAS with MySQL

KevinPawsey
Posts: 85
Joined: Wed Aug 22, 2018 2:52 pm

Re: Mayan EDMS index reference post

Post by KevinPawsey » Mon Dec 24, 2018 9:46 am

Hi Shoop,

glad it was of some use to you... did you get the filename index working? If you did, I would appreciate it if you could post the syntax of the index that you used. That way I will add it to the reference in the above post (with credit of course!).

You are correct, I did miss that out about the Linked Documents, and will add that bit in soon... otherwise there will be a lot of people with indexes and no content (although I have never tried it to see what would happen).

Thanks again for your input.

Kind regards


Kevin
Last edited by KevinPawsey on Fri Feb 22, 2019 3:37 pm, edited 1 time in total.
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core

Shoop_nl
Posts: 3
Joined: Thu Dec 20, 2018 2:03 pm

Re: Mayan EDMS index reference post

Post by Shoop_nl » Mon Dec 24, 2018 3:50 pm

Kevin,

Payslip file names are always buildup in the same way:

Payslip-nnnnnn-yyyy-m.pdf

Where:
nnnnnn: is employee number
yyyy: is the year of the payslip
m: is the month of the payslip. (this can have 1 or 2 digits, depending on which month it is)

I use childe nodes to index to my end point where I will have my documents linked.
In the case of payslips it works like this:
Company is my index label (and slug) it is linked to document type Payslip
Then in tree template at the root I have added a new child node:

Code: Select all

{% if "payslip" in document.label|lower %}Payslips{% endif %}
Then in this childnode I have added a new childe node

Code: Select all

{{document.label|slice:"15:19"}}
this will pickup the year part of the document.label (document.label is filled with the file name of the document inserted into myan-edms)
Then in the childnode for the year I have added a new childe node (where the link documents option is checked)

Code: Select all

 {% if document.label|length < 26 %}0{{document.label|slice:"20:21"}} {% else %}{{document.label|slice:"20:22"}} {% endif %}
this will pickup the month part of the document.label and it takes into account if the month has 1 or 2 digits, in the index it will always show as two digits.

Note:
The paylips come into Mayan-edms via a Soure watchfolder Payslips
(I have to download them from the acountants website and put them in this watch foleder, the rest is automatic now).
Running Mayan-edms on QNAP-NAS with MySQL

wesss
Posts: 3
Joined: Wed Feb 13, 2019 8:35 pm

Re: Mayan EDMS index reference post

Post by wesss » Fri Feb 15, 2019 9:53 pm

If you'd like an index that uses the Cabinet label, this example may help:

Code: Select all

{% if document.cabinets.count == 0 %} None
{% else %}
{% for cab in document.cabinets.all %}
{{ cab.label }}
{% endfor %}
{% endif %}

KevinPawsey
Posts: 85
Joined: Wed Aug 22, 2018 2:52 pm

Re: Mayan EDMS index reference post

Post by KevinPawsey » Thu Mar 14, 2019 6:19 pm

wesss wrote:
Fri Feb 15, 2019 9:53 pm
If you'd like an index that uses the Cabinet label, this example may help:

Code: Select all

{% if document.cabinets.count == 0 %} None
{% else %}
{% for cab in document.cabinets.all %}
{{ cab.label }}
{% endfor %}
{% endif %}
Thank you for that... just tried it and it works a charm... great for picking out the straggling documents that don't have a cabinet ;)
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core

Rihoj
Posts: 3
Joined: Thu Jul 25, 2019 9:55 pm

Re: Mayan EDMS index reference post

Post by Rihoj » Sat Jul 27, 2019 7:40 pm

This took me a while to figure out so I thought I would share.

I have an index to help me figure out what items I still need update metadata for. However, even though I have them under the same document type, not all of them need the same metadata. So I setup an index to tell me if there was no metadata on the file, but I needed a way to exclude certain files. I added a tag called "NoDueDateNeeded", but could not find a way to keep those files from being in the index. So this is what I came up with: (if someone has a better way than I am open to it.)

Code: Select all

{% if not "NoDueDateNeeded" in document.tags.all|join:" " %}{% if not document.metadata_value_of.date_due %}No Due Date{% endif %}{% endif %}

Post Reply