I’ve been approached about whether or not Mayan EDMS can be configured to scan for potentially malicious documents and act accordingly. I did a few google searches and also a few forum searches and came up empty.
Have you encoutered this sort of question and what was your response?
Got this question from an auditor trying to sound smart. Sent a memo pointing to the code showing how documents are proceeded in an isolated environment. Any damaged a malicious could try to do will be confined to isolated temp files that are discarded after processing. In the same memo included links to all the unpatched exploits in the MS Office and Adobe products we use daily that make them vulnerable to malicious documents. Haven’t heard from the auditor again.
Best advantage open source has over closed software, it can be easily audited with proof.
I assume that Mayan has no built in solution yet to handle malicious documents. However, much depends on your business environment / workflows you are dealing with.
which kinds of documents have potential risk of being malicious? i.e.
Textfiles, images: very low
pdf, xlsx, docx: low
doc, xls, binary: high
how are documents uploaded? Maybe you have the option of using an upstream system to check documents before they are uploaded:
Email => Email scanner
Watch-Folders, By users => Malware solution on OS
So in many cases you can and should make sure to allow and upload only safe documents.
However, if you really need to scan your documents regularly, one option could be to build another docker microservice for this job, i.e.
Docker container with AV engine, i.e. ClamAV with access to Mayan’s document storage
install Python ClamAV modules
write a script to scan documents per needed and handle them with Mayan’s API (i.e. restrict for admin access only, delete them, tag them, whatever…)
Please note that this is only one untested possible solution, maybe someone else has a better concept?
Good point, while the users are interacting with Mayan, the documents are representations. However, when the user downloads the original, which we can’t restrict in our case, we have concern.
For us, we mainly store and process outlook email files (.eml) and PDF. Also some Microsoft office documents. Users need to download originals in many cases and that’s where the concern is. Mainly I wanted to know if there were already some sort of hooks to pass documents to Clam or the like and I understand there aren’t any built in just now.
In our case, our documents ultimately wind up in an Azure storage account, so I will likely try out Azure ‘Malware scanning in Defender for Storage’