The M-Files Community will be updated on Tuesday, April 2, 2024 at 10:00 AM EST / 2:00 PM GMT and the update is expected to last for several hours. The site will be unavailable during this time.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Auto perform OCR at check-in

Might be missing something or simply getting old but I am pretty sure there was a n option to for an OCR operation on PDF documents at check in time (assuming obviously the the OCR module is available).

How does one enforce that ?

Parents
  • We found better luck with implementation of a commercially available OCR engine Omini Page and writing our own program which uses the Omini Page API, M-Files API and Workflow to automate the ingestion of documents that need OCR.  User identifies a source folder and a destination folder (which is in the Network Folder configuration).  We then automatically Copy all files to the destination that are not PDF's or Tiffs.  We then inspect the PDF files, and if it has already been OCR'd we move it to the destination directory.  If Not OCR'd we then perform the OCR task using the API and convert Tiffs to PDF.  We also track which documents fail the OCR.  We convert all the PDF documents to managed content and identify the source of OCR - Omin Page, External, or Failed.  This allows us to use a robust OCR engine and not have the problems found in the M-Files supplied tools..

Reply
  • We found better luck with implementation of a commercially available OCR engine Omini Page and writing our own program which uses the Omini Page API, M-Files API and Workflow to automate the ingestion of documents that need OCR.  User identifies a source folder and a destination folder (which is in the Network Folder configuration).  We then automatically Copy all files to the destination that are not PDF's or Tiffs.  We then inspect the PDF files, and if it has already been OCR'd we move it to the destination directory.  If Not OCR'd we then perform the OCR task using the API and convert Tiffs to PDF.  We also track which documents fail the OCR.  We convert all the PDF documents to managed content and identify the source of OCR - Omin Page, External, or Failed.  This allows us to use a robust OCR engine and not have the problems found in the M-Files supplied tools..

Children
No Data