Mayan and outlook

Questions, comments, discussions. Over time certain topics might be moved to their own category.
Post Reply
rduz
Posts: 7
Joined: Thu Sep 26, 2019 8:34 pm

Mayan and outlook

Post by rduz » Sun Sep 29, 2019 8:45 pm

Hi,

We are considering moving about 300k documents to Mayan and we've been evaluating it for a week or so now through the docker images. For the most part, we see Mayan as a positive tool that we'd like to use.

We've run into a couple of issues that I'd like to ask about:
  • Our documents are about 2/3 PDF and 1/3 outlook .msg files. The PDF's import and convert pretty well, but the .msg files not so much. I think there's some linux support for being able to read .msg files and as a dev with no mayan experience at all, I'd like to know how difficult it might be to add support for previewing previously unsupported document types? We'd at least like to be able to preview the text of the email, and it would be a great plus for us to list attachments, and perhaps go so far as to index the email attachment contents.
  • Speaking of importing, I created a ~500 page text file filled with random dictionary words and converted it to pdf by opening it in libre office and then exported it to pdf with no encryption or password. I imported the document to mayan and mayan can show all the pages in the page preview and it knows the page count. However, there is no 'content' data nor ocr data so I presume that failed for some reason. Nothing is listed at system -> tools -> OCR errors page. Is there another log somewhere that I should be reviewing?

    Here's the PDF info:
    Creator: Writer
    Producer: LibreOffice 3.5
    CreationDate: Fri Sep 13 09:05:27 2019
    Tagged: no
    Pages: 501
    Encrypted: no
    Page size: 612 x 792 pts (letter)
    File size: 1521548 bytes
    Optimized: no
    PDF version: 1.4
  • I notice it takes several clicks to get at the document in the case where you'd like to open the actual file instead of working with the preview. For example, from recently added, I think I have to click the document link to access the preview, then choose quick download, then open the file from the browser footer bar. I'd prefer to have an 'open file' command available, maybe a link or button on the preview, or at least as an option in the action menu. Rather than forcing a download every time, I'd like an option to add the 'Content-Disposition: inline; filename="filename.pdf"' header to be added so we can skip some of the extra clicks. How difficult might it be for a new-to-mayan dev to add an "open file" option to Mayan?
Thank you for any insight you can offer.

Regards,


rduz

User avatar
rosarior
Posts: 406
Joined: Tue Aug 21, 2018 3:28 am

Re: Mayan and outlook

Post by rosarior » Tue Oct 01, 2019 4:11 am

Hi,
Our documents are about 2/3 PDF and 1/3 outlook .msg files. The PDF's import and convert pretty well, but the .msg files not so much. I think there's some linux support for being able to read .msg files and as a dev with no mayan experience at all, I'd like to know how difficult it might be to add support for previewing previously unsupported document types? We'd at least like to be able to preview the text of the email, and it would be a great plus for us to list attachments, and perhaps go so far as to index the email attachment contents.
Supporting .msg .eml and other packaged email files is in the roadmap. There is no timeline for adding this feature because of the unknowns surrounding some of the formats like .msg which are proprietary. When it comes to enterprise features we also do collaboration to help fund and fast track these. Here is an example of one such collaboration with Berkeley County administration to add workflow improvements, redactions, and security auditing: https://www.mayan-edms.com/post/collaborations/
Speaking of importing, I created a ~500 page text file filled with random dictionary words and converted it to pdf by opening it in libre office and then exported it to pdf with no encryption or password. I imported the document to mayan and mayan can show all the pages in the page preview and it knows the page count. However, there is no 'content' data nor ocr data so I presume that failed for some reason. Nothing is listed at system -> tools -> OCR errors page. Is there another log somewhere that I should be reviewing?
If the deployment was done using Docker, execute

Code: Select all

docker logs <container name>
to see additional error information. It could have been an out of memory error and the OS killed the process. Can you share the PDF file so we can run tests?
I notice it takes several clicks to get at the document in the case where you'd like to open the actual file instead of working with the preview. For example, from recently added, I think I have to click the document link to access the preview, then choose quick download, then open the file from the browser footer bar. I'd prefer to have an 'open file' command available, maybe a link or button on the preview, or at least as an option in the action menu. Rather than forcing a download every time, I'd like an option to add the 'Content-Disposition: inline; filename="filename.pdf"' header to be added so we can skip some of the extra clicks. How difficult might it be for a new-to-mayan dev to add an "open file" option to Mayan?
This is not a hard feature to add and will consist mostly on copying the current document download code to create a sibling link, view, and URL router.

We can add this feature to our work list or you can add it and submit it for inclusion via merge request. New features submission need to fill out the Contributor Assignment Agreement otherwise we won't be able to accept your code.

For institutions fill out this version:
https://docs.mayan-edms.com/topics/deve ... -agreement

For individual developers fill out this version:
https://docs.mayan-edms.com/topics/deve ... -agreement

Hope this helps. Thanks.

rduz
Posts: 7
Joined: Thu Sep 26, 2019 8:34 pm

Re: Mayan and outlook

Post by rduz » Tue Oct 01, 2019 4:05 pm

Hi,

Thanks so much for getting back with me.

As per number one, that makes total sense as I've been down the road on supporting vendor proprietary formats.

I never did find anything regarding OCR failure in the docker log, and it's happened twice on two different import requests. I moved both documents to the trash, emptied the trash, then reimported, and this time it worked fine. Not sure why that would be, but perhaps the OS killed the processes as you mention. I tried to attach it here, but pdf is apparently not allowed in the attachment window due to the following error: "Invalid file extension: lines.pdf" though I am unconcerned because it seems to be working now.

I've been reviewing the code as per number three, so thanks for the encouragement.

Regards,

rduz

rduz
Posts: 7
Joined: Thu Sep 26, 2019 8:34 pm

Re: Mayan and outlook

Post by rduz » Tue Nov 05, 2019 11:03 pm

Hi Roberto,

I took a stab at the content-disposition inline vs attachment issue and was surprised to see that django_downloadview doesn't actually support inline. Without a code change, setting 'attachment' to 'false' causes no content disposition header to be created. In google chrome (at least) if there isn't any content disposition header, the browser just spins for 30 seconds and throws an error. It seems odd to me that such support would not already be in django_downloadview, but I am new to this space. I get the feeling I might be barking up the wrong tree.

For the moment, I've made the following change:

Code: Select all

root@localhost:/opt/mayan-edms/lib/python2.7/site-packages/django_downloadview# diff response.orig.py response.py
57c57
< def content_disposition(filename):
---
> def content_disposition(self, filename):
74a75,79
>     if self.attachment:
>         disposition = 'attachment'
>     else:
>         disposition = 'inline'
>
76c81
<         return 'attachment'
---
>         return disposition
80c85,87
<         return "attachment; filename=\"{ascii}\"".format(ascii=ascii_filename)
---
>         return "{disposition}; filename=\"{ascii}\"" \
>                 .format(disposition=disposition,
>                         ascii=ascii_filename)
82,83c89,91
<         return "attachment; filename=\"{ascii}\"; filename*=UTF-8''{utf8}" \
<                .format(ascii=ascii_filename,
---
>         return "{disposition}; filename=\"{ascii}\"; filename*=UTF-8''{utf8}" \
>                .format(disposition=disposition,
>                        ascii=ascii_filename,
185,187c193,194
<             if self.attachment:
<                 basename = self.get_basename()
<                 headers['Content-Disposition'] = content_disposition(basename)
---
>             basename = self.get_basename()
>             headers['Content-Disposition'] = content_disposition(self, basename)
Any thoughts or advice would be appreciated.

Thank you.

Regards,
rduz

rduz
Posts: 7
Joined: Thu Sep 26, 2019 8:34 pm

Re: Mayan and outlook

Post by rduz » Wed Nov 06, 2019 8:46 pm

And nevermind. I implemented the code and google chrome, and I believe Edge, both ignore the content disposition and download by default.

"This is the way Chrome handles file downloads. Google decided that rather than allowing applications to automatically open files downloaded through Chrome, the files must be downloaded to your local machine. There is, unfortunately, no workaround for this. If you want to automatically open files from Sharepoint to be worked in and saved back up to SP, you'll need to use a different browser like IE or Firefox."

Actually, there is a sort of workaround, but it's systemwide per file type:

"Click on the link to the excel document (in chrome) At the bottom left, you’ll see the spreadsheet icon as it downloads. Instead of clicking to open, click on the little arrow to the right and select "Always open files of this type".
You’ll have to do this for each file type."

Thank you.

Regards,
rduz

Post Reply