Export all Documents

Questions, comments, discussions. Over time certain topics might be moved to their own category.
voarsh
Posts: 3
Joined: Tue Dec 22, 2020 4:01 pm

Re: Export all Documents

Post by voarsh »

zbig wrote: Wed Sep 02, 2020 11:55 am
KevinPawsey wrote: Tue Nov 27, 2018 12:16 am Silly question, but can’t you just go to Documents/All Documents.

Once that page has loaded, on the top right there is a drop down with a tick box next to it. Tick the box, that selects all documents, then go to “Advanced download” in the drop down. From there you can download all documents in a zip file.

Maybe that will work? I haven’t tried it myself, just noticed the option the other day.
This is a great advice of a simple solution that actually works, thank you! No idea why it got ignored. I've done just that and got a handy ZIP file containing all the documents with their original file names. The only slight downside is that Mayan's "Select all" works page-wide only but it's nothing a quick adjustment of "COMMON_PAGINATE_BY" can't fix. Well, I guess you could potentially run into memory problems during ZIP compression on some huge archive sizes but then just divide the exports into, say 500 documents large chunks and you're good, not a big deal.
Thumbs up. All good advice.

I think Mayan can do a better job with doc exports.
I can't do a DB dump, or any migration because of "issues" and I am left with this as a last resort.
noses
Posts: 8
Joined: Thu Sep 24, 2020 12:05 am

Re: Export all Documents

Post by noses »

What about writing a short Python program that is extracting all documents AND their metadata and storing this in files?

Due to legal requirements (and to avoid a vendor lock-in) I wrote a backup script for that purpose (even dealing with certain uglinesses like the paged output of the API) which is requesting every document on the server and saving it locally. It works well for me... If you don't erase the destination directory you're even archiving documents removed from the archive.
FreddieD
Posts: 1
Joined: Thu Jul 16, 2020 3:05 pm

Re: Export all Documents

Post by FreddieD »

If this helps, you can go into the Postgres database and run this query:

Code: Select all

select label, file from documents_document dd, documents_documentversion ddv
where dd.id = ddv.id
This will link the GUID filename stored in media/document_storage to the original document filename. From there, you can export it to a CSV or tab delimited file, and run a copy command on each record.
Post Reply