ChronoScan performs zonal OCR and has the ability to write custom PDF metadata from extracted data. To minimize the artifacts necessary to populate document metadata in Mayan we can use the PDF’s custom metadata to our benefit.
Assigning custom PDF metadata from extracted zones in ChronoScan during export.
The same File’s EXIF attributes in Mayan.
Mayan has been configured with two document types:
- All incoming documents will be ingested as
Auto Ingest Unknown
. Invoice without Purchase Order
will be one of the resulting document types after conditional evaluation of EXIF attributes.
Invoice without Purchase Order
carries a number of metadata types, which will be populated from EXIF attributes.
exif | document |
---|---|
customdollaramount | Dollar Amount |
custominvoicenumber | Invoice Number |
customvendorname | Vendor Name |
There are two workflows:
Auto Ingest Categorize
is assigned to Auto Ingest Unknown
document type. The automatic transition exists because I do not believe EXIF attributes are available to the initial workflow state when a document is created (TODO: clarify).
The Document Categorized
state executes an action to change document type based on a condition of an EXIF attribute value.
{% if workflow_instance.document.file_latest.file_metadata_value_of.exiftool__customdocumenttype == "invoicewithoutpurchaseorder" %} True {% endif %}
At this point the document type is changed to Invoice without Purchase Order
which launches the second workflow Invoice Approval
.
The first state Received
handles assigning document metadata from EXIF attributes. Each action sets the appropriate document metadata: {{ workflow_instance.document.file_latest.file_metadata_value_of.exiftool__custominvoicenumber }}
from the file’s EXIF attributes.
This results in: