Ingesting documents using file EXIF metadata

ChronoScan performs zonal OCR and has the ability to write custom PDF metadata from extracted data. To minimize the artifacts necessary to populate document metadata in Mayan we can use the PDF’s custom metadata to our benefit.

Assigning custom PDF metadata from extracted zones in ChronoScan during export.

The same File’s EXIF attributes in Mayan.

Mayan has been configured with two document types:

  • All incoming documents will be ingested as Auto Ingest Unknown.
  • Invoice without Purchase Order will be one of the resulting document types after conditional evaluation of EXIF attributes.

Invoice without Purchase Order carries a number of metadata types, which will be populated from EXIF attributes.

exif document
customdollaramount Dollar Amount
custominvoicenumber Invoice Number
customvendorname Vendor Name

There are two workflows:

Auto Ingest Categorize is assigned to Auto Ingest Unknown document type. The automatic transition exists because I do not believe EXIF attributes are available to the initial workflow state when a document is created (TODO: clarify).

The Document Categorized state executes an action to change document type based on a condition of an EXIF attribute value.

{% if workflow_instance.document.file_latest.file_metadata_value_of.exiftool__customdocumenttype == "invoicewithoutpurchaseorder" %} True {% endif %}

At this point the document type is changed to Invoice without Purchase Order which launches the second workflow Invoice Approval.

The first state Received handles assigning document metadata from EXIF attributes. Each action sets the appropriate document metadata: {{ workflow_instance.document.file_latest.file_metadata_value_of.exiftool__custominvoicenumber }} from the file’s EXIF attributes.

This results in:

2 Likes