Version 3.5 - How to populate the new search backend (whoosh)

Questions, comments, discussions. Over time certain topics might be moved to their own category.
Post Reply
clews
Posts: 1
Joined: Mon Oct 05, 2020 3:56 pm

Version 3.5 - How to populate the new search backend (whoosh)

Post by clews »

Hi!

I just upgrade my mayan installation (deployed) to the new version 3.5. Mainly for the new search backend, because the old native one was getting really slow with the number of documents I try to manage. But right now, all my search results are empty. I retriggered parsing and indexing, but it does not seem to populate the whoosh index. How could I check if indexing is done?

I also wanted to congratulate by this occasion for the wonderful work you all do! Mayan is just awesome!

Cheers,
Clews.
User avatar
rosarior
Developer
Developer
Posts: 582
Joined: Tue Aug 21, 2018 3:28 am
Location: Puerto Rico
Contact:

Re: Version 3.5 - How to populate the new search backend (whoosh)

Post by rosarior »

Hi,

After checking that the new backend setting is being picked up, go to "Tools" -> "Reindex Search Backend", select "Yes" and this will schedule a background task that will go over each document in the system, extract the value for every field that is defined, and send that to the search backend.

This process takes time, first to go over each field of every document, version, page, tag, etc. And then for the search backend to do its own processing (segment, stemming, etc).

Full text engines offset the speed of the search by having an expensive indexing process in favor of a fast retrieval.

Search results will continue to improve as the background task goes over more and more documents.
2020-10-07_19-01.png
2020-10-07_19-01.png (61.58 KiB) Viewed 506 times
2020-10-07_19-01_1.png
2020-10-07_19-01_1.png (136.7 KiB) Viewed 506 times
2020-10-07_19-01_2.png
2020-10-07_19-01_2.png (27.16 KiB) Viewed 506 times
ceree
Posts: 1
Joined: Tue Oct 13, 2020 2:22 pm

Re: Version 3.5 - How to populate the new search backend (whoosh)

Post by ceree »

Hi,

I had the same problem, or maybe just a question of understanding.

After switching to the new search engine I always got empty results. But i had always searched with wildcards. as soon as i enter the whole search term, i get a result.

Is it not possible to search with wildcards with the new search engine?

I have tried the following:

Document content: This is the best translation so far.

- Searching for: trans / *trans* / trans* = no results
- Searching for: translation = document found

Thanks,
Chris
joh-ku
Posts: 6
Joined: Thu Jun 04, 2020 11:31 am

Re: Version 3.5 - How to populate the new search backend (whoosh)

Post by joh-ku »

ceree wrote: Tue Oct 13, 2020 2:25 pm Hi,

I had the same problem, or maybe just a question of understanding.

After switching to the new search engine I always got empty results. But i had always searched with wildcards. as soon as i enter the whole search term, i get a result.

Is it not possible to search with wildcards with the new search engine?

I have tried the following:

Document content: This is the best translation so far.

- Searching for: trans / *trans* / trans* = no results
- Searching for: translation = document found

Thanks,
Chris
Hi folks,

I didn't have problems in the form that I don't get any search results at all, but I second that the overall search behavior slightly changed with the switch-over to whoosh and that in some cases incorrect (empty) results are returned. This is especially noticeable, as ceree already outlined, when regex or fragments of the full search term are entered.

I'll update if I can provide more details.

Best regards
Johannes
User avatar
rosarior
Developer
Developer
Posts: 582
Joined: Tue Aug 21, 2018 3:28 am
Location: Puerto Rico
Contact:

Re: Version 3.5 - How to populate the new search backend (whoosh)

Post by rosarior »

Hi,

When using the Whoosh search backend, the search query format needs to be the one supported by Whoosh. The search terms entered on the form are minimally processed to allow integration but most of the search syntax is passed as is.

Try using a combination of quotes and asterisks outside and inside quotes.

With your feedback we can then add additional pre-processing to make the Whoosh backend behavior similar to that of the default database backend.

https://whoosh.readthedocs.io/en/latest/querylang.html
joh-ku
Posts: 6
Joined: Thu Jun 04, 2020 11:31 am

Re: Version 3.5 - How to populate the new search backend (whoosh)

Post by joh-ku »

Hi,

Thanks for clarification, that explains the differences I noticed.

Whoosh, for instance, does not find a file "StrangeString_EVEN_LONGER_NAME.pdf", when I search for "StrangeString" (under the assumption that this string representation does only occur within this filename and nowhere else). However, it'll find the document, when I search with wildcard like "StrangeString*", which is at least the technically correct behavior. Means, I now have to enter wildcards manually. Apart from that whoosh works just fine and pleasingly fast after an observation period of three days.

Thanks for your support and all the effort you put into this outstanding project!

Best regards
Johannes
Post Reply