Bulk Export of All Documents and Metadata to Human Readable Format

This is related, but not identical to Bulk downloading

Is there a way to do a bulk export of all documents and metadata, including event history per document, so that the result can be read by humans?
Ideally, the export can also be read by machines too, perhaps by exporting a JSON file per document for the metadata and event history.

Background:
I’m currently evaluating whether Mayan EDMS might be a good fit for the accounting department of the non-profit organization I’m working in as an IT guy. In 40 years of IT, I have seen many systems come - and almost as many go. If we decide in 15 years time that we should use a different DMS, we need a way to archive the data accumulated in Mayan EDMS in a way that is readable even without having Mayan EDMS available. By law, we are required to store accounting data for 10 years. So any decision to switch DMS would require either keeping the old DMS active for another 10 years (imagine the effort of maintaining a DMS that would run on nothing newer than Windows XP, now in 2024) or having a bulk export offering the plain PDF files and text-searchable metadata. I clearly prefer the bulk export method.

Is there such a bulk export in Mayan EDMS? I’ve spent several hours in the forum and knowlege base but couldn’t find such a menu item.

There was once a feature close to that using Django serialization. It was removed because serialization is a very slow process and because there is no direct method to translate from a database into JSON, CSV, or any other format that users were expecting to use.

A single document may have many other objects attached to it which means including all objects values and database IDs for each document which results in a very large file.

For these cases is best to create a custom module that loads only the database models that are relevant to you and serializes in your preferred format. This module must runs out of band from the main Mayan stack as a command line utility to avoid affecting the rest of the system. And just like a backup if should run with the stack shutdown to avoid serializing corrupted data that is manipulated during the export.

The best aspect about Mayan EDMS is that it is free software and an export can be done at anytime during your retention period as long as you keep at least one copy of the database and one of the code. We’ve done migrations and even upgrades from installations that have been offline for several years.

This is important because performing an export without an idea of the next system’s schema or import requirements will not allow you to determine the usability of any export done currently.

If you want to retain your data in the most stable and reusable format possible, then a text SQL (not a PostgreSQL custom binary) dump of the database will always be your best and most usable, compatible solution. This is how we migrate most of our enterprise customers and frequently do multi million document migrations with this method.

Backing up a Mayan EDMS Docker Compose stack PostgreSQL service

1 Like