Problem with elasticsearch

When things don't work as they should.
Post Reply
User avatar
germain
Posts: 4
Joined: Tue Jul 26, 2022 12:32 am
Location: Canada
Contact:

Problem with elasticsearch

Post by germain »

When I search document with term with many words, most of the results are irrelevent and don't contain any of these words


search would probably be ok, with only "match" request . You can add fuzzyness there.

I didn't know that search supported regular expression. You could use the wildcard search if you detect ?, * , in the search (maybe other). Same idea, if you detect regular expression control caracters in the search, you can use it.

here's the request done to the elasticsearch backend (on mayan-documentsearchresult) for words "active directory". there's many useless request (on id, datetime_created, uuid, checksum. I know nobody search on id or uuid or checksum. maybe datetime (who know)

{
"from": 0,
"size": 100,
"query": {
"bool": {
"should": [
{
"fuzzy": {
"id": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"id": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"id": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"id": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"document_type__label": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"document_type__label": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"document_type__label": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"document_type__label": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"datetime_created": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"datetime_created": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"datetime_created": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"datetime_created": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"label": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"label": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"label": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"label": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"description": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"description": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"description": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"description": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"uuid": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"uuid": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"uuid": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"uuid": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"files__checksum": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"files__checksum": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"files__checksum": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"files__checksum": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"files__filename": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"files__filename": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"files__filename": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"files__filename": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"files__mimetype": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"files__mimetype": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"files__mimetype": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"files__mimetype": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"cabinets__label": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"cabinets__label": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"cabinets__label": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"cabinets__label": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"comments__text": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"comments__text": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"comments__text": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"comments__text": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"files__file_pages__content__content": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"files__file_pages__content__content": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"files__file_pages__content__content": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"files__file_pages__content__content": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"workflows__log_entries__comment": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"workflows__log_entries__comment": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"workflows__log_entries__comment": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"workflows__log_entries__comment": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"files__file_metadata_drivers__entries__key": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"files__file_metadata_drivers__entries__key": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"files__file_metadata_drivers__entries__key": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"files__file_metadata_drivers__entries__key": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"files__file_metadata_drivers__entries__value": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"files__file_metadata_drivers__entries__value": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"files__file_metadata_drivers__entries__value": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"files__file_metadata_drivers__entries__value": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"metadata__metadata_type__name": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"metadata__metadata_type__name": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"metadata__metadata_type__name": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"metadata__metadata_type__name": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"metadata__value": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"metadata__value": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"regexp": {
"metadata__value": {
"value": "active directory",
"flags_value": 255,
"max_determinized_states": 10000,
"boost": 1
}
}
},
{
"wildcard": {
"metadata__value": {
"wildcard": "active directory",
"boost": 1
}
}
},
{
"fuzzy": {
"versions__version_pages__ocr_content__content": {
"value": "active directory",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true,
"boost": 1
}
}
},
{
"match": {
"versions__version_pages__ocr_content__content": {
"query": "active directory",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}




some missing here because post would be too long... look at the attachment
Attachments
indent_request.txt.gz
(1019 Bytes) Downloaded 9 times
Germain, security specialist
User avatar
germain
Posts: 4
Joined: Tue Jul 26, 2022 12:32 am
Location: Canada
Contact:

Re: Problem with elasticsearch

Post by germain »

Hi,

I fixed search in Elasticsearch and added some mappings. It could need further changes to get it working the way you want. Feel free to ask. Best would be to have setting directly in the backend of mayan. Elasticsearch such a beast.

Anyway, I now only use one type of search for each field. most of the fields use "match". there's different kind of search : match, fuzzy, regexp and wildcard but it's better not use them all at time because you'll get a lot of irrelevant results.

files__filename use fuzzy search. I mostly search in file name so this is well adapted for me.

I check for some fields and skip search into it because some are irrelevant. who search for uuid ? .

I remapped the files__filename to use the camel analyzer. this way, it tokenizes file name into more tokens. when it detect case change, it knows this is another word.
MooseX::FTPClass2_beta -> moose, x, ftp, class, 2, beta
Attachments
elasticsearch_new.py.gz
(2.38 KiB) Downloaded 14 times
Germain, security specialist
Post Reply