cannot preview pptx file

When things don't work as they should.
Post Reply
calvinsug
Posts: 13
Joined: Mon Sep 30, 2019 5:29 am

cannot preview pptx file

Post by calvinsug » Wed Oct 16, 2019 3:31 am

Hi, just tried to upload pptx file, and the preview doesn't work. Then, i tried the older power point format (.ppt) and the preview worked!
Is it a bug with pptx file? or just mayan cannot support pptx file format?

User avatar
lantonov
Posts: 5
Joined: Sun Oct 13, 2019 7:49 pm

Re: cannot preview pptx file

Post by lantonov » Wed Oct 16, 2019 5:36 am

The same happens with .doc and .docx. Probably MIME type not created for the newer Microsoft Office formats.

User avatar
rosarior
Posts: 406
Joined: Tue Aug 21, 2018 3:28 am

Re: cannot preview pptx file

Post by rosarior » Thu Oct 17, 2019 12:09 am

Hi,

Microsoft Office formats, like most proprietary formats are not standard and are hard to support. This is by design to allow the respective companies to control use of their files.

One issue is that their MIME types are obscure and not well documented. We add as many as we find in the wild (https://gitlab.com/mayan-edms/mayan-edm ... rals.py#L5).

Microsoft also decided a few years back to make their office file formats compressed with Zip. This confuses MIME type detection as it correctly detects the file a Zip file and not an office file. There is little we can do on own side to improve the support these formats until companies like Microsoft continue doing these things.

More details can be found in this blog post about Mayan's image previewer and image converter: https://www.mayan-edms.com/post/mayan-converter/
To make matters worst, some file types, like Microsoft DOCX, are just XML files compressed as a Zip file. This is why many programs detect and try to handle DOCX as compressed files.
One solution we've been discussing for a long time now is adding the ability for users to force a specific MIME type for a document in case the automatic detection failed.

User avatar
lantonov
Posts: 5
Joined: Sun Oct 13, 2019 7:49 pm

Re: cannot preview pptx file

Post by lantonov » Thu Oct 17, 2019 4:13 pm

Open Office files like LibreOffice .odt files are also directories compressed in zip format, however, the structure of directories is different. For example, unarchiving .odt file looks like

Code: Select all

Configuraions2 (folder)
META-INF (folder)
Pictures (folder)
Thumbnails (folder)
content.xml
layout-cache
manifest.rdf
meta.xml
mimetype
settings.xml
styles.xml
Unarchiving MS Word .docx file looks like

Code: Select all

_rels (folder)
customXml (folder)
docProps (folder)
word (folder)
[Content_Types].xml
The word folder contains

Code: Select all

_rels (folder)
media (folder)
theme (folder)
document.xml
endnotes.xml
fontTable.xml
footer1.xml
footnotes.xml
settings.xml
styles.xml
webSettings.xml
So, if you change the extension of a MS Office file from .docx, .xlsx, .pptx to .zip, it can be opened normally in Mayan-EDMS and previewed, OCR'd, etc. For .docx files there are 3 Python packages that can extract text from them: docx2txt, docx, and docx2python.

Post Reply