This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Duplicate detection based on file content ?

I believe there was multiple discussions in the past about performing duplicate detection based on file data (not the metadata, which is what the compliance kit “unique object enforcement” does to the best of my knowledge) ?

This could be implemented by calculating a hash for checked-in objects and making sure it is unique.

As I’m pretty sure this is not an exotic requirement and was wondering if anyone has performed such a development and is willing to share (be it experience, advisory and/or actual code !).

Parents
  • Duplicate detection based on file content was introduced in March 2020, so if you are running an older M-Files version you should upgrade to get the feature. If you are running a later version already, make sure the feature is not disabled in the vault configuration: How to Disable Duplicate File Detection

  • Thanks - I muss have missed that one. We are indeed running latest version and from time to time get alerts but I’m pretty sure we still have duplicates slipping in.
    Two more questions if I may

    Not sure to understand “In some cases, the MD5 value can be different even when the file content of the compared documents is the same. This happens because some files formats store additional metadata in the file which changes the MD5 value.” - would you have an exemple ?

    Is there a “hidden” metadata that can surface the hash in the GUI (or API) ?

Reply
  • Thanks - I muss have missed that one. We are indeed running latest version and from time to time get alerts but I’m pretty sure we still have duplicates slipping in.
    Two more questions if I may

    Not sure to understand “In some cases, the MD5 value can be different even when the file content of the compared documents is the same. This happens because some files formats store additional metadata in the file which changes the MD5 value.” - would you have an exemple ?

    Is there a “hidden” metadata that can surface the hash in the GUI (or API) ?

Children