Hi,
I read a few posts and in particular the intresting articles on the use of UUID and “How storage works”. It seems that the use of UUID allows for more efficient management of different versions of the same document.
But as we are going to use Mayan EDMS to manage several tens of thousands of documents, I have to plan the following : How will ve recover the documents if we no longer want to use Mayan EDMS ?
Will they have their original name, format and file metadata ?
Will they be classified in folders according to their type, their cathegories, or other criteria ?
Thank you for your clarification
Hi,
It seems that the use of UUID allows for more efficient management of different versions of the same document.
Yes, without using UUIDs for filenames it would be impossible to have two documents with the same name.
UUIDs for documents database entries as just to provide a unique ID which the database ID field does not guarantee.
But as we are going to use Mayan EDMS to manage several tens of thousands of documents, I have to plan the following : How will ve recover the documents if we no longer want to use Mayan EDMS ?
Will they have their original name, format and file metadata ?
Will they be classified in folders according to their type, their cathegories, or other criteria ?
Breaking this paradigm is Mayan’s biggest difference and strength. Most document management systems are just a web frontend to a file manager. They are still dependent of folder structures not just for operation but also for categorization. In Mayan, documents exist in a single giant virtual flat structure and categorization is just a business operation. This way the same document can belong to multiple categorization units, something difficult or even impossible to accomplish in other products.
When exporting documents, the resulting filename will depend on what you use business-wise as the uniqueness of a document. This can be a sequence like a serial number, or product ID, project number, permit number. This will be different for each installation.
If you want to at least conserve the original filename of the file before it became a Mayan document, use the “Original” or “UUID plus original” backend for each document type. This will attempt to conserve the original filename with the caveat that this removes control from Mayan and it will up to the filesystem to resolve filename collisions. Corruption could happen that Mayan would not be able to consolidate.
UUIDs for filenames are default because they suffer from very little collisions and are short enough that almost any filesystem will support them as some filesystems won’t allow filenames beyond a certain character count. Most of the time this is 255 characters. So if your original filename is 250 character and then you add the UUID this could cause two files with the same filename breaking the database to storage association.
If extracting documents from Mayan is a priority you can use an API client that will download documents and then store them by folders based on cabinets or tags. However if your documents have multiple tags or can reside in multiple cabinets or indexes (like clients and projects) exporting them to a simple folder structure will not be possible without some consolidation logic performed at export. Basically you would be attempting to store N dimensional data (Mayan’s database) into a 2 dimensional layout (folders).
Working on an article to show how to extract documents using an API client.
Hi Marcs,
the files themselves are stored in their original format, they are never touched by Mayan EDMS. Neither they are stored in sub folders or with their original names by default, as you mentinoed the UUID is used to store them in Mayan’s storage folder. Anything else (metadata, tags, ocr) is built around by the database.
When recovering them as their original files, you need information from the database. You can easily associate them to the files in the storage folder and copy them back by reading their original filenames.
Torsten