Remove duplicates tool

Requests for new functionality or improvements in existing functionality. Please provide clear descriptions of your request, an example or if possible a real life scenario.
Post Reply
jeverling
Posts: 5
Joined: Tue Aug 21, 2018 11:46 pm

Remove duplicates tool

Post by jeverling » Mon Jun 24, 2019 4:28 pm

What do you think about a Remove all duplicates tool? I ended up with a large number of duplicates, and going through all of them individually will take a long time.
I think a tool that allows you to remove all duplicates automatically, maybe with an option to choose whether to keep the most recently added version or the oldest version around, and a warning that you might end up deleting the version with the most complete metadata etc. could be very convenient for everybody that has more than a small number of duplicates.

jeverling
Posts: 5
Joined: Tue Aug 21, 2018 11:46 pm

Re: Remove duplicates tool

Post by jeverling » Mon Jun 24, 2019 4:45 pm

I saw that you mentioned such a tool here: viewtopic.php?p=1244#p1244

Great! Next two weeks will be very busy but after that I could help with the implementation.

User avatar
rosarior
Posts: 387
Joined: Tue Aug 21, 2018 3:28 am

Re: Remove duplicates tool

Post by rosarior » Fri Jun 28, 2019 2:48 pm

Look forward to your feedback and help. Right now what is blocking the feature is how do we select which of the duplicated document to delete and how we present that choice to the users. We can't blindly show deletion by age because deletion by metadata (or tags) presence or absence might the more important to some users. The deletion code can't know anything about tags or metadata so that means having the ability to let other apps register what parameters to display for the deletion selection.

The presentation problem is: Do we allow users to select one criteria for the entire universe of duplicated documents or do we come up with a way to group duplicated documents by a property (or metadata) so that duplication deletion can be done differently for each group of documents.

As always, the more diverse and flexible the feature, the harder it will be to implement, and we have to balance that internal problem too.

Post Reply