Remove duplicates tool

Requests for new functionality or improvements in existing functionality. Please provide clear descriptions of your request, an example or if possible a real life scenario.
Post Reply
jeverling
Posts: 5
Joined: Tue Aug 21, 2018 11:46 pm

Remove duplicates tool

Post by jeverling »

What do you think about a Remove all duplicates tool? I ended up with a large number of duplicates, and going through all of them individually will take a long time.
I think a tool that allows you to remove all duplicates automatically, maybe with an option to choose whether to keep the most recently added version or the oldest version around, and a warning that you might end up deleting the version with the most complete metadata etc. could be very convenient for everybody that has more than a small number of duplicates.
jeverling
Posts: 5
Joined: Tue Aug 21, 2018 11:46 pm

Re: Remove duplicates tool

Post by jeverling »

I saw that you mentioned such a tool here: viewtopic.php?p=1244#p1244

Great! Next two weeks will be very busy but after that I could help with the implementation.
User avatar
rosarior
Developer
Developer
Posts: 574
Joined: Tue Aug 21, 2018 3:28 am
Location: Puerto Rico
Contact:

Re: Remove duplicates tool

Post by rosarior »

Look forward to your feedback and help. Right now what is blocking the feature is how do we select which of the duplicated document to delete and how we present that choice to the users. We can't blindly show deletion by age because deletion by metadata (or tags) presence or absence might the more important to some users. The deletion code can't know anything about tags or metadata so that means having the ability to let other apps register what parameters to display for the deletion selection.

The presentation problem is: Do we allow users to select one criteria for the entire universe of duplicated documents or do we come up with a way to group duplicated documents by a property (or metadata) so that duplication deletion can be done differently for each group of documents.

As always, the more diverse and flexible the feature, the harder it will be to implement, and we have to balance that internal problem too.
akme
Posts: 1
Joined: Tue Aug 25, 2020 11:11 am

Re: Remove duplicates tool

Post by akme »

Would it be possible to include an option in the duplicate view to include the document Checksum so that you cant sort the view to identify duplicate documents wrt checksum but different filenames.
Also a select all except (1st|oldest|last) as a quick way to move to trash
bwakkie
Posts: 18
Joined: Fri Feb 14, 2020 8:28 pm

Re: Remove duplicates tool

Post by bwakkie »

I would suggest to merge the metadata of the document with the most metadata and then remove newer duplicates.

When I have duplicates with different filenames they are not show together in the overview which makes it also difficult to decide which to keep manually. [https://gitlab.com/mayan-edms/mayan-edms/-/issues/872]
Post Reply