Feature Request: In search results show where in the document the criteria was found

Requests for new functionality or improvements in existing functionality. Please provide clear descriptions of your request, an example or if possible a real life scenario.
Post Reply
nate7475
Posts: 3
Joined: Fri Aug 23, 2019 11:30 pm

Feature Request: In search results show where in the document the criteria was found

Post by nate7475 » Fri Aug 23, 2019 11:44 pm

Hi,
Perhaps this already exists or I am trying to use the software in way that is not intended, but currently if I do a search I see all of the documents that the word/phrase is found in (which is great), but what I don't know is where in that document it found it.

So if I get 4 search results and they are each 10-80 page documents, I then have to open each document up in OCR mode, search for the phrase again with chrome, see what page(s) it was found on, then flip back to preview mode and then go to the page in the PDF/document it was found to view the search result.

I viewed the readthedocs available online and bought the pre-release book and did not see any information on how to view where the results were found in the documents that it found them in (btw, great documentation with the book, although screen shots would be awesome as that always helps with comprehension). Thanks,

Nate

User avatar
rosarior
Posts: 387
Joined: Tue Aug 21, 2018 3:28 am

Re: Feature Request: In search results show where in the document the criteria was found

Post by rosarior » Sat Aug 24, 2019 11:55 pm

Hi,

Thank for purchasing a copy of the book! I wanted to add screenshots, but due to the large amount of them required, and the fast changin nature of the UI, means it would only be possible to add screenshots using automation. So far the few automatic screenshots plug-in for LateX I've found didn't work.

The reason we have not been able to add result highlights is because we don't have the exact coordinates of the OCR text in relation to the image. For that we need to switch the entire OCR system to work with hOCR, which does provide coordinates. When Mayan first came out 8 years ago hOCR was in its infancy and not fully implemented. Tesseract now has mature hOCR support but in the way it is implemented, currencly conflicts with the test implementation of Zone OCR we are developing.

One partial solution would be to at least tell user the page numbers where the result was found. Would that work?

Thanks again for purchasing a copy of the book!

nate7475
Posts: 3
Joined: Fri Aug 23, 2019 11:30 pm

Re: Feature Request: In search results show where in the document the criteria was found

Post by nate7475 » Sun Aug 25, 2019 7:53 am

The page number(s) would certainly make it more convenient, but if I'm the only one asking about it, then I wouldn't worry about it.

I was hoping that showing the preceding 50 words and post 50 words after each result wouldn't be too difficult, but alas, I know nothing about python.

Thanks for all your work on this project; I watched the video you did on youtube about it with Floss a few years back--really neat evolution.

Post Reply