This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Duplicate File Prevention

Former Member
Former Member
I am finding a lot of threads and post about people wanting to prevent duplicate files from being imported into M-Files, but many of them are from over a year ago and everyone has a different semi-solution.

What is the best modern solution to prevent employees from submitting duplicate files?

At this point, even preventing them from uploading files with the same file name would be good enough to prevent some of the mistakes.

Thank You,
Royce
  • Install the free of charge Configuration Accelerators module and use Unique Object Enforcement.
    Contact your M-Files reseller to get the module or download it yourself from catalog.m-files.com/.../
    Remember to download the Configuration Accelerators License as well
  • Note that the Unique Object Enforcement module only prevents duplicates based on the object metadata, to fulfill requirements such as "There can be only one project plan for each project" (uniqueness based on the Class and Project properties). It does not check or compare file contents in any way.

  • Former Member
    Former Member
    Okay, thank you for the information bright-ideas.dk and Joonas Linkola.

    To further my question based on knowing the Unique Object Enforcement (UOE) module is the best way to approach the problem, can I have clarification on a specific scenario?

    We are transferring over 150,000 or 374GB of files from an old document manage software. We are not sure which ones are duplicates in that package, and after the transfer we want to prevent duplicates from being submitted as well. You say that the UOE module prevents duplicate object metadata. I am uncertain if it is programmed to look at all metadata or we can set it to look at specific properties.

    Do we choose which properties are looked at using this module?
    If we want it to look at only the name of the file, would that work?
    If we want it to look at only the exact size of a file, would that work?
    What prompts when we have a duplicate based on the properties we set to look at each file? (Overwrite options or an error?)

    I guess we want it to be accurate, but not ruin our import either by removing non-duplicates. Example: If we have an invoice package of 4 files that all have very similar properties since they are related to each other, we do not want to see one or two of them getting blocked out since it would be hell trying to fix that problem.

    Any guidance that you guys can provide would be appreciated!

    Sincerely,
    Royce
  • Former Member
    Former Member

    Okay, thank you for the information bright-ideas.dk and Joonas Linkola.

    To further my question based on knowing the Unique Object Enforcement (UOE) module is the best way to approach the problem, can I have clarification on a specific scenario?

    We are transferring over 150,000 or 374GB of files from an old document manage software. We are not sure which ones are duplicates in that package, and after the transfer we want to prevent duplicates from being submitted as well. You say that the UOE module prevents duplicate object metadata. I am uncertain if it is programmed to look at all metadata or we can set it to look at specific properties.

    Do we choose which properties are looked at using this module?
    If we want it to look at only the name of the file, would that work?
    If we want it to look at only the exact size of a file, would that work?
    What prompts when we have a duplicate based on the properties we set to look at each file? (Overwrite options or an error?)

    I guess we want it to be accurate, but not ruin our import either by removing non-duplicates. Example: If we have an invoice package of 4 files that all have very similar properties since they are related to each other, we do not want to see one or two of them getting blocked out since it would be hell trying to fix that problem.

    Any guidance that you guys can provide would be appreciated!

    Sincerely,
    Royce


    This sounds like it could be handled before everything is imported. You could write a script to calculate the hash of each file, and remove the duplicates.
  • Former Member
    Former Member
    There are many programs like CCleaner that has a sufficient method of detecting file duplicates before being imported, but the idea is to compare the files to objects already in the vault. In a business environment, it needs to be simplified for office workers. We need to compare objects in the vault to the file we are trying to add into the vault on a daily basis on dozens of devices to prevent duplicate files from being imported, and it cannot be something like a script that they need to run themselves.

    The mass import should be done within a few weeks. We decided to go with the M-Files Mass Importer software that will take the metadata from our previous document management software within a CSV sheet and all of the documents to combine into the vault. It does not really solve the duplicate problem while trying to compare files being imported to the objects already imported, but it is a start.