Mayan wrongly shows different index entries for identical node values

carvvf · June 11, 2024, 10:58am

Hi all,
it seems that Mayan wrongly shows different index entries for identical node values, depending on if document metadata is set manually or automatically by a workflow, even though the values are exactly the same.

The index has been duly reset and rebuilt several times.

Any suggestions about this strange behaviour?

roberto.rosario · June 12, 2024, 6:28am

Hi,

Something else must be the cause because value uniqueness is enforced at the database level:

Here is the test. Attempting to create the same value under the same parent will raise an IntegrityError and won’t be allowed by the database.

Check the values are indeed the same and not just look the same. This could be a case of unicode obfuscation or an extra space, carriage return or other invisible character somewhere in the text.

carvvf · June 12, 2024, 10:44am

Hi,

I think that you are right:

As a workaround, I had to create a new workflow that is triggered by metadata changes, in order to set that very metadata again. In this way, all metadata is uniformly set in the same manner by the same workflow, regardless of who was the original editor (either a user or another workflow).

IMHO, indexes should compare what the user reads, not the character encoding behind the text. Someone may see this as a “feature” but to me this is a full blown bug!

Thanks again for your precious work.

roberto.rosario · June 12, 2024, 7:22pm

Hi, this is neither a feature nor a bug. It is how every software comparison works on all databasas, software, and even computer languages.

This is a case of human error and unrealistic expectations.

carvvf · June 12, 2024, 9:00pm

Hi,

my point of view is different. To my knowledge, there is no such unique or universal way for how “software comparison works on all databases, software, and even computer languages”. Quite on the contrary, there are “high level” and “low level” programming languages and techniques, depending on the purpose and the scope of the application.

Document management systems like Mayan are supposed to mostly deal with office formats (doc, pdf, etc.). As such, they pertain to the “high level” category of software like Office suites do. In some cases, Mayan is not as user friendly as it should be, and it seems to go too deep in “low level” details that are out of its context.

roberto.rosario · June 12, 2024, 9:39pm

That is not the point I’m making or the point of the post.

“ABC” is not the same as " ABC " which seems to be what your users entered.

Also, this “Ρ”, this “Р” and this “P” may look the same to a human but they are in fact different letters.

Ρ U+03A1 GREEK CAPITAL LETTER RHO
Р U+0420 CYRILLIC CAPITAL LETTER ER
P U+0050 LATIN CAPITAL LETTER P

A user of your Mayan installation has either entered the metadata with spaces, an invisible character, or their keyboard or system encoding is set incorrectly.

This is neither a feature nor an error in Mayan EDMS but human error. It is something that happens in all software.

https://www.researchgate.net/publication/221462244_Fighting_unicode-obfuscated_spam

system · June 13, 2024, 9:40am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.