Adding metadata automatically

Questions, comments, discussions. Over time certain topics might be moved to their own category.
Post Reply
jabbrwcky
Posts: 1
Joined: Sun Mar 03, 2019 10:15 pm

Adding metadata automatically

Post by jabbrwcky » Sun Mar 03, 2019 10:24 pm

Hi,

Ijust took Mayan for a spin for an hour or two. It looks quite nice so far, even if I could use a little more extensive documentation of concepts and features. This also is my first excursion into eDMS, so I expect some challenges :)

Is there a way to automatically extract metadata from a (scanned) document, e.g Invoice number, date, amount, etc?

My best guess would be that a workflow, because it could react on OCR being finished, but I could not figure out anything yet.

Any pointers are very welcome.

Thanks,
Jens

KevinPawsey
Posts: 81
Joined: Wed Aug 22, 2018 2:52 pm

Re: Adding metadata automatically

Post by KevinPawsey » Wed Mar 06, 2019 1:27 pm

There is an add-on called Document Analyzer, that I think will do that job... but not sure.

I haven't successfully managed to get the add-on working, so I couldn't say for sure what it is like to use.
Running Mayan-EDMS on: OpenMediaVault, (Docker plugin), on x86 dual-core

frizzo
Posts: 1
Joined: Tue Apr 16, 2019 1:46 pm

Re: Adding metadata automatically

Post by frizzo » Tue Apr 16, 2019 2:31 pm

Quite my challenge as well. I want to add values from the pdf ´Info’ dictionary like keywords, author, etc.
What I have been finding so far:
  • There are metadata parsers and validators. For date and/or time attributes those are included. I understand Mayan can be extended with custom parsers and validators.
  • Those definitions can be made in the setup -> metadata types menu. One option is to use code snippets of Django template language. I am way out my depth, no idea how that works. Also a "path" to parsers and validators can be defined. There is a picklist for time, date combined time+date. I have been trying to find the (I guess python) code of those as an example. But no luck so far.
  • I have been checking the mayan-edms/media/config.yml file. There are some settings pointing to parsers/validators used for the metadata types config.
Any help/examples would be much appreciated. I'll post here if I make any progress.
Cheers, frizzo

Post Reply