Feature Follow-up: Search terms highlighting in docs

Requests for new functionality or improvements in existing functionality. Please provide clear descriptions of your request, an example or if possible a real life scenario.
Post Reply
mwmentor
Posts: 2
Joined: Thu Sep 05, 2019 8:45 pm

Feature Follow-up: Search terms highlighting in docs

Post by mwmentor »

Hi there:

I have recently been evaluating Mayan as a solution to document storage for my agency. It has some great features that I have seen already, and many more that I have yet to find... I am particularly pleased that it is able to OCR/parse content from images as well as PDF's which will be particularly useful in my environment. So great :)

Something that I would really appreciate is being able to see both search terms and phrases highlighted on the pages where they occur. I see that this is planned for a future release, and I wanted to find out what sort of time frame is planned for it? It doesn't need to be specific - a general idea is good.

Thanks so much.
-Michael
User avatar
rosarior
Developer
Developer
Posts: 688
Joined: Tue Aug 21, 2018 3:28 am
Location: Puerto Rico
Contact:

Re: Feature Follow-up: Search terms highlighting in docs

Post by rosarior »

Hi, thanks for the feedback.

At the time Mayan integrated with the Tesseract OCR engine, the hOCR standard was in its infancy. hOCR is now more mature and we are considering moving the code to support that. hOCR recognized text and its position. With the position it will be possible to add search result highlights.

The hOCR implementation will take considerable effort and changes to the data structures. There is no specific timeline for the implementation.

It would be possible to prioritize this if interested parties funded the effort as it would allow us to hire developers for this.
mwmentor
Posts: 2
Joined: Thu Sep 05, 2019 8:45 pm

Re: Feature Follow-up: Search terms highlighting in docs

Post by mwmentor »

Hi Rosario:

Thanks for letting me know... that sounds fine. Thanks for a great product :)

-Michael
chris75vie
Posts: 1
Joined: Thu Apr 21, 2022 5:34 am

Re: Feature Follow-up: Search terms highlighting in docs

Post by chris75vie »

Hi Rosario,
i've recently installed mayan edms and i like it very much. also fascinated by search functionality based on the ocr. I have very large documents, so highlighting search terms in the docs would be marvelous. I'm a software developer and i'd like to contribute on that feature. Could you give me some hint, where to start?
kind regards, chris
bwakkie
50 Posts
50 Posts
Posts: 73
Joined: Fri Feb 14, 2020 8:28 pm

Re: Feature Follow-up: Search terms highlighting in docs

Post by bwakkie »

See my request in GitLab: https://gitlab.com/mayan-edms/mayan-edms/-/issues/833

Personally, I would prefer something like pdf.js which already has a hi-light functionality, the thing is that Mayan is breaking up the document in images.

Also maybe interesting is that PostgreSQL RUM indexes store positions to the words. Not the position in the image/page though I believe
User avatar
michael
Developer
Developer
Posts: 297
Joined: Sun Apr 19, 2020 6:21 am

Re: Feature Follow-up: Search terms highlighting in docs

Post by michael »

Hello,

Proving highlights for search results has been a topic of discussion. The problem is that OCR is not the only way documents can be searched in Mayan, therefore the highlight feature needs to work for any search term type. We have a working solution but since it is very disruptive it will be released in phases.

The first stage will ship in version 4.4 which includes more direct integration with the backing search engines. This is an internal interface change only.

The next phase will be to change the search API in version 5.0 to return search results instead of just documents. Search results object will contain the document along with other search related information, like scoring, highlighting, etc.

The final phase will be to integrate the API changes in the UI. Depending on scheduling this will happen in version 5.0 or 5.1.

Cheers!
Post Reply