How to clean vault from duplication

Hello Community,

We are looking for solutions to find duplicates in the vault.
It so happened that our client has been working with duplicated files for several years, and now they want to clean up the vault.
Some files have the same content but different metadata information some have different workflows but the same content.

What strategy would you recommend for removing duplicates?
Maybe, someone already has a API tool for finding and removing duplicates from the vault?

I found that Mika Javanainen mentioned here and here about some kind of tool that is also a support.
Is this existing tool and how can I find information?

