How to clean vault from duplication

Hello Community,

We are looking for solutions to find duplicates in the vault.
It so happened that our client has been working with duplicated files for several years, and now they want to clean up the vault.
Some files have the same content but different metadata information some have different workflows but the same content.

What strategy would you recommend for removing duplicates?
Maybe, someone already has a API tool for finding and removing duplicates from the vault?

I found that Mika Javanainen mentioned here and here about some kind of tool that is also a support.
Is this existing tool and how can I find information?

  • Hello Joonas,

    I know about this option, and output result presented for us more then 10.000 files.

    I'm looking for solution to prevent this:

    "...someone would need to go through these manually and decide"

    Lets assume I found 15437 files duplicated in the vault. Some couple with same name some not. I can't select each second file in the view and click delete. How can I automatically define pair for first duplicated file  and How can I define condition?

  • There's no built-in way to do this as far as I know, so if you need to automate the cleanup process somehow I think you'd need to do that through the API with a custom tool (see for instance FindVaultDuplicates method). The tool Mika was referring to in that old thread was basically doing what the built-in option does nowadays.