Mayan for Bioinformatics Data Management

Questions, comments, discussions. Over time certain topics might be moved to their own category.
Post Reply
asclepios
Posts: 3
Joined: Thu Oct 24, 2019 1:11 pm

Mayan for Bioinformatics Data Management

Post by asclepios » Fri Oct 25, 2019 5:56 pm

Hi,

I work for a genomics research center,

We are planning to use Mayan for our Protocols and SOP management, now and we are also looking for a solution to track our data file. Our genomics data files are basically big plaintext files ( 1-10 GB per file) . We mainly want to archive, read and gather extra metadata for these files, so Mayan looks like a good solution.

I wondering if Mayan would perform well on such big files? Are there any configuration parameters that I should adjust to ease the management of these big files?

Regards,

User avatar
rosarior
Posts: 406
Joined: Tue Aug 21, 2018 3:28 am

Re: Mayan for Bioinformatics Data Management

Post by rosarior » Mon Oct 28, 2019 12:31 am

Hi,

This is an interesting deployment! Mayan should work well with any size files. The files are processed in the background when uploaded. The process detects the MIME type for conversion and preview and determines the page count. The speed at which uploaded documents will appear and become ready for use will depend on the amount of resources you devote for the install. There are not inherent page size or document count limits anywhere in the code.

The Docker image is a one-size-fits-all that favors small to medium images, so the default image is a good choice to get started but sacrifices scalabilty over ease of install, so that might not be the best option for your deployment in the long term. I suggest a direct deployment or a custom Docker image. Also launch multiple workers for the document process queue to avoid a bottleneck when uploading many big documents at the same time.

The other change I recommend is storage. If you can use block storage for performance or object storage if your document count is going to be big.

Mayan has many settings that can be tweaked to optimize for many different workloads but those are the initial general suggestions.

asclepios
Posts: 3
Joined: Thu Oct 24, 2019 1:11 pm

Re: Mayan for Bioinformatics Data Management

Post by asclepios » Mon Oct 28, 2019 12:38 pm

Hi,

Thanks for the reply, I will definitely apply the changes you are suggesting... I will also test both, block storage, and object storage (Minio) and see which perform best for our situation.

Post Reply