Page 1 of 1

Version 3.5 - How to populate the new search backend (whoosh)

Posted: Mon Oct 05, 2020 4:03 pm
by clews
Hi!

I just upgrade my mayan installation (deployed) to the new version 3.5. Mainly for the new search backend, because the old native one was getting really slow with the number of documents I try to manage. But right now, all my search results are empty. I retriggered parsing and indexing, but it does not seem to populate the whoosh index. How could I check if indexing is done?

I also wanted to congratulate by this occasion for the wonderful work you all do! Mayan is just awesome!

Cheers,
Clews.

Re: Version 3.5 - How to populate the new search backend (whoosh)

Posted: Wed Oct 07, 2020 11:09 pm
by rosarior
Hi,

After checking that the new backend setting is being picked up, go to "Tools" -> "Reindex Search Backend", select "Yes" and this will schedule a background task that will go over each document in the system, extract the value for every field that is defined, and send that to the search backend.

This process takes time, first to go over each field of every document, version, page, tag, etc. And then for the search backend to do its own processing (segment, stemming, etc).

Full text engines offset the speed of the search by having an expensive indexing process in favor of a fast retrieval.

Search results will continue to improve as the background task goes over more and more documents.
2020-10-07_19-01.png
2020-10-07_19-01.png (61.58 KiB) Viewed 507 times
2020-10-07_19-01_1.png
2020-10-07_19-01_1.png (136.7 KiB) Viewed 507 times
2020-10-07_19-01_2.png
2020-10-07_19-01_2.png (27.16 KiB) Viewed 507 times

Re: Version 3.5 - How to populate the new search backend (whoosh)

Posted: Tue Oct 13, 2020 2:25 pm
by ceree
Hi,

I had the same problem, or maybe just a question of understanding.

After switching to the new search engine I always got empty results. But i had always searched with wildcards. as soon as i enter the whole search term, i get a result.

Is it not possible to search with wildcards with the new search engine?

I have tried the following:

Document content: This is the best translation so far.

- Searching for: trans / *trans* / trans* = no results
- Searching for: translation = document found

Thanks,
Chris

Re: Version 3.5 - How to populate the new search backend (whoosh)

Posted: Fri Oct 16, 2020 7:16 am
by joh-ku
ceree wrote: Tue Oct 13, 2020 2:25 pm Hi,

I had the same problem, or maybe just a question of understanding.

After switching to the new search engine I always got empty results. But i had always searched with wildcards. as soon as i enter the whole search term, i get a result.

Is it not possible to search with wildcards with the new search engine?

I have tried the following:

Document content: This is the best translation so far.

- Searching for: trans / *trans* / trans* = no results
- Searching for: translation = document found

Thanks,
Chris
Hi folks,

I didn't have problems in the form that I don't get any search results at all, but I second that the overall search behavior slightly changed with the switch-over to whoosh and that in some cases incorrect (empty) results are returned. This is especially noticeable, as ceree already outlined, when regex or fragments of the full search term are entered.

I'll update if I can provide more details.

Best regards
Johannes

Re: Version 3.5 - How to populate the new search backend (whoosh)

Posted: Sat Oct 17, 2020 5:01 pm
by rosarior
Hi,

When using the Whoosh search backend, the search query format needs to be the one supported by Whoosh. The search terms entered on the form are minimally processed to allow integration but most of the search syntax is passed as is.

Try using a combination of quotes and asterisks outside and inside quotes.

With your feedback we can then add additional pre-processing to make the Whoosh backend behavior similar to that of the default database backend.

https://whoosh.readthedocs.io/en/latest/querylang.html

Re: Version 3.5 - How to populate the new search backend (whoosh)

Posted: Sun Oct 18, 2020 9:05 am
by joh-ku
Hi,

Thanks for clarification, that explains the differences I noticed.

Whoosh, for instance, does not find a file "StrangeString_EVEN_LONGER_NAME.pdf", when I search for "StrangeString" (under the assumption that this string representation does only occur within this filename and nowhere else). However, it'll find the document, when I search with wildcard like "StrangeString*", which is at least the technically correct behavior. Means, I now have to enter wildcards manually. Apart from that whoosh works just fine and pleasingly fast after an observation period of three days.

Thanks for your support and all the effort you put into this outstanding project!

Best regards
Johannes