Bulk import

Questions, comments, discussions. Over time certain topics might be moved to their own category.
Post Reply
james
Posts: 10
Joined: Fri Jan 08, 2021 6:15 pm

Bulk import

Post by james »

Hello
I have seen several threads about importing but no definitive answer or example scripts.

I have several thousand documents I would like to import with the details stored in a spreadsheet. I can happily morph this into a particular format and can add the document filename (once scanned). What is the best way to import this and is there a recommended batch size?

I bought the book, but it's not covered in there either.

Good advice always welcome

Thanks
James
User avatar
rosarior
Developer
Developer
Posts: 649
Joined: Tue Aug 21, 2018 3:28 am
Location: Puerto Rico
Contact:

Re: Bulk import

Post by rosarior »

Hi,

we have been developing an import app that has drivers for different sources (like CS) and online/cloud providers.

But it missed the inclusion deadline for version 4.0. Hopefully it will be included in version 4.1
bwakkie
Posts: 33
Joined: Fri Feb 14, 2020 8:28 pm

Re: Bulk import

Post by bwakkie »

Hi James,

I did this before with 10000 documents: viewtopic.php?f=7&t=3133

If you have in your spreadsheet a direct link to unique filenames you can use the following approach. Its not perfect but I got my metadata in:
  1. Create a staging_folder next to your watch_folder and copy all your documents in there
  2. Change ownership of all documents to mayan:mayan
  3. Move all files at ones to your watch_folder
  4. Wait till all documents are loaded (e.g. no documents are left in the watch folder) Sometimes it can hang on documents, move them out of the watch_folder and deal with them seperatly.
  5. Create metadata for each row of your spreadsheet you like to have in mayan
  6. Upload your spreadsheet in a table [spreadsheet_metadata here]
  7. You need to figure out which number correspond to which metadata (SELECT * FROM public.metadata_metadatatype)
    • (If your fields are longer than 255 characters change the 'value' field in metadata_documentmetadata to text)
  8. Per metadata I run the following script to attach the metadata to the documents:

Code: Select all

INSERT INTO metadata_documentmetadata
(value,document_id,metadata_type_id)
	SELECT meta.author,doc.id,3
	FROM public.documents_document doc
	inner join spreadsheet_metadata meta on doc.meta::text = meta.reference
	where meta.author IS NOT NULL
ON CONFLICT (document_id, metadata_type_id) DO NOTHING
So in the above command I am adding all author metadata from the spreadsheet to the metadata_type_id field 3 (which is the author). I am sure it would be possible to do this in one go.
james
Posts: 10
Joined: Fri Jan 08, 2021 6:15 pm

Re: Bulk import

Post by james »

Thank you -- that us useful
Post Reply