How to search through HTML files and emails?

hejko · January 18, 2025, 5:20pm

Good evening,

Mayan does not include the content of HTML files and HTML emails in the file content field. The downside of this is that these files are not included in the search and that there is no file preview.

To solve this, I had the idea to create a workflow, which checks if the mimetype equals to “text/html” and adds the actual text content of the file to the ocr content field.

But playing around with the sandbox, I noticed that some helpful variables do not return anything at all. Even on plain text files.

Any idea why? Or any other solution?

Thank you for your help!

Edit:
I noticed that image files do not have any OCR content, too.
This is on a new docker install.