The M-Files Community will be updated on Tuesday, April 2, 2024 at 10:00 AM EST / 2:00 PM GMT and the update is expected to last for several hours. The site will be unavailable during this time.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

M-Files OCR Module

We have a client that's using Smart Classifier in their vault. However, we're noticing that a lot of documents are skipped for classification due to lacking a text layer. We were looking at options for OCR, and came across the OCR Module for M-Files. Reading the description of the module, it looks like the OCR module can be enabled for documents that are scanned into the vault. Does this include documents that are added to the vault via "drag n drop," and can this be an automatic conversion to a fully text-searchable pdf?  

or, are there other options for this kind of requirement?

Thank you

  • Hi there, 

    There's a chapter of the text recognition "ocr" in the user guide. Did you have a look at it already? 

    www.m-files.com/.../Scanning_text_recognition.html

  • When the client has Smart Classifier, then they also have the option to use Discovery. With a bit of trickery you can indeed configure Discovery to identify PDF files without a text layer and then add a specific property to those documents. This property can be used to trigger a workflow that runs OCR on the document and saves the result as a text layer in a new version of the PDF. So it is possible but it has limits. The OCR process is not suitable for handling large quantities of files mainly because of a relatively high load on the server and particularly because it attempts to handle up to 100 files in each batch. If any one of them goes wrong the remaining files in that batch may end up in a limbo. So it would probably be OK to handle a few new files pr hour but be careful if you need to handle thousands of files.