Bulk Export of All Documents and Metadata to Human Readable Format

roberto.rosario · July 26, 2024, 4:05pm

There was once a feature close to that using Django serialization. It was removed because serialization is a very slow process and because there is no direct method to translate from a database into JSON, CSV, or any other format that users were expecting to use.

A single document may have many other objects attached to it which means including all objects values and database IDs for each document which results in a very large file.

For these cases is best to create a custom module that loads only the database models that are relevant to you and serializes in your preferred format. This module must runs out of band from the main Mayan stack as a command line utility to avoid affecting the rest of the system. And just like a backup if should run with the stack shutdown to avoid serializing corrupted data that is manipulated during the export.

The best aspect about Mayan EDMS is that it is free software and an export can be done at anytime during your retention period as long as you keep at least one copy of the database and one of the code. We’ve done migrations and even upgrades from installations that have been offline for several years.

This is important because performing an export without an idea of the next system’s schema or import requirements will not allow you to determine the usability of any export done currently.

If you want to retain your data in the most stable and reusable format possible, then a text SQL (not a PostgreSQL custom binary) dump of the database will always be your best and most usable, compatible solution. This is how we migrate most of our enterprise customers and frequently do multi million document migrations with this method.

Backing up a Mayan EDMS Docker Compose stack PostgreSQL service