This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Auto perform OCR at check-in

Might be missing something or simply getting old but I am pretty sure there was a n option to for an OCR operation on PDF documents at check in time (assuming obviously the the OCR module is available).

How does one enforce that ?

Parents
  • ooh, this is a good one, a head scratcher that we did solve. 

    For us, we had a fun time figuring it out. 

    We were migrating scanned PDFs from a legacy system, and wanted M-Files to perform OCR. This put us in the right direction: 

    We found this thread on the online M-Files Community: community.m-files.com/.../9173

    There, Joonas provided the script that prompts M-Files to convert the PDF to OCR. This can't be done on initial import because the file has to already be at version 1. 

    So our workflow brings in the documents, and then there is a time delay between version 1 being checked in, and then it automatically going to the next workflow state which runs the script to OCR. 

    Our notes:

    Delayed the transition for automatically imported documents to speed up the import process. If the delay is not there, the importer will wait for OCR to finish before moving on to the next document. This allows them to Queue up and the server will scan them afterwards.

    Code found at: community.m-files.com/.../10703

    'Make workflow pause x minutes before moving on
    'Will only work using LastModified (21). Change to (20) to use created date.
    'Delay will be minimum the specified x minutes. Can be up to 60 + x minutes depending on when M-Files server checks the conditions.
    'Script to be placed in Transition Trigger
    '2019.07.04 Karl Lausten
    'Modified by Jason vonI Nov 22 2021 to use Modified date/time since it is deeper int he workflow.
    
    Option Explicit
    Dim dModified : dModified = PropertyValues.SearchForProperty(21).TypedValue.GetValueAsTimeStamp().UtcToLocalTime().GetValue()
    
    'Desired delay in minutes: 
    Dim iDelay : iDelay = 2 
    Dim dGoAhead : dGoAhead = DateAdd("n",iDelay,dModified)
    
    'test time settings to verify the setup.
    'err.raise mfscriptcancel, "dModified (UTC converted to local time):" & dModified & ", dGoAhead: " & dGoAhead & ", now (local time):" & now 
    
    if now > dGoAhead then
    			AllowStateTransition = True
    end if

    Good luck!

  • There are some downsides to this approach if you need to handle many documents in each run. The OCR process by default attempts to handle batches of 100 documents. It checks them all out before it starts processing. If something goes wrong halfway through the process the remaining documents are left checked out. This may not be a big issue if the files have been imported to M-Files. However, if they remain on a network folder, they will be left with a Read Only attribute, and you will need direct access to that network folder in order to remove that attribute. So be careful and consider the implications before you set out to run OCR en masse.

  • Absolute agree. We have been migrating from a legacy system and for one of our document categories, we had inherited thousands of non-OCRd PDF scans. We did plenty of testing before we committed to this. We ran into lots of issues before we implemented the time delay, then M-Files happily churned out OCRd PDFs

Reply
  • Absolute agree. We have been migrating from a legacy system and for one of our document categories, we had inherited thousands of non-OCRd PDF scans. We did plenty of testing before we committed to this. We ran into lots of issues before we implemented the time delay, then M-Files happily churned out OCRd PDFs

Children
No Data