I found a conversation from eight years ago where Mika Javanien says:
M-Files has a tool (extra cost) that will convert non-indexed PDF to text indexed PDF as a background task in M-Files server.
I found a conversation from eight years ago where Mika Javanien says:
M-Files has a tool (extra cost) that will convert non-indexed PDF to text indexed PDF as a background task in M-Files server.
This tool has been deprecated years ago and is no longer available as it was causing a lot of issues. OCR is a heavy operation on the server and M-Files OCR is not designed for mass operations such as converting thousands of documents in the vault to searchable in one go. OCR in M-Files is basically meant for human use: upload a document from scanner and convert it to searchable as you store the doc to the vault.
If mass operations are needed, the recommendation is to use dedicated software or have OCR support directly in the scanner so that the documents are already searchable when they are stored to the vault. If the scanner does not have OCR support, you can also convert the documents to searchable as part of the import operation in external file sources or put them in a workflow where you convert them with a workflow script (PerformOCROperation method in the API). Users can also manually convert documents to searchable.
To add to Joonas Linkola's excellent reply, there's also a variety of third-party capture-and-extraction integrations listed in our Solution Catalog.
© 2025 M-Files, All Rights Reserved.