This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

OCR mail attachment

Former Member
Former Member
In your opinion, what's the best/cleanest way to OCR mail-attachments (TIF files from MFC device) that are imported through an external mail source. So, just an e-mail with a tif-file that need to be indexed.

While the External File Import source has the option to OCR the files that are being imported, the mail connector does not have such thing.

We can use a workflow action to OCR, but this seems to have an unpredictable result where OCR isn't performed as it should, it only creates a PDF, but not a searchable one.

I want to prevent that we have to create a solution where mail source imports have to be downloaded/extreacted by an event handler to a file system folder to be imported again in a regular external file import job, it's a bit cumbersome.
Parents
  • We can use a workflow action to OCR, but this seems to have an unpredictable result where OCR isn't performed as it should, it only creates a PDF, but not a searchable one.


    Did you run the OCR operation in a workflow state action script or just use the built-in PDF conversion option in the workflow state? The PDF conversion doesn't do OCR as far as I know, you need to trigger OCR in a script. Here's an example:

    Option Explicit

    ' Prepare the files of the object for modification by script.
    Dim files
    Set files = Vault.ObjectFileOperations.GetFilesForModificationInEventHandler( ObjVer )

    ' Prepare OCR options.
    Dim opts
    Set opts = CreateObject( "MFilesAPI.OCROptions" )
    opts.PrimaryLanguage = MFOCRLanguageEnglishUS
    opts.SecondaryLanguage = MFOCRLanguageFinnish

    ' Perform OCR on each of the convertible files.
    Dim file
    For Each file In files

    ' Is the file in a convertible file format?
    If file.Extension = "tif" Or _
    file.Extension = "tiff" Or _
    file.Extension = "jpg" Or _
    file.Extension = "jpeg" Or _
    file.Extension = "pdf" Then

    ' Convert this file to searchable PDF.
    Vault.ObjectFileOperations.PerformOCROperation ObjVer, file.FileVer, _
    opts, MFOCRZoneRecognitionModeNoZoneRecognition, Nothing, True

    End If

    Next
Reply
  • We can use a workflow action to OCR, but this seems to have an unpredictable result where OCR isn't performed as it should, it only creates a PDF, but not a searchable one.


    Did you run the OCR operation in a workflow state action script or just use the built-in PDF conversion option in the workflow state? The PDF conversion doesn't do OCR as far as I know, you need to trigger OCR in a script. Here's an example:

    Option Explicit

    ' Prepare the files of the object for modification by script.
    Dim files
    Set files = Vault.ObjectFileOperations.GetFilesForModificationInEventHandler( ObjVer )

    ' Prepare OCR options.
    Dim opts
    Set opts = CreateObject( "MFilesAPI.OCROptions" )
    opts.PrimaryLanguage = MFOCRLanguageEnglishUS
    opts.SecondaryLanguage = MFOCRLanguageFinnish

    ' Perform OCR on each of the convertible files.
    Dim file
    For Each file In files

    ' Is the file in a convertible file format?
    If file.Extension = "tif" Or _
    file.Extension = "tiff" Or _
    file.Extension = "jpg" Or _
    file.Extension = "jpeg" Or _
    file.Extension = "pdf" Then

    ' Convert this file to searchable PDF.
    Vault.ObjectFileOperations.PerformOCROperation ObjVer, file.FileVer, _
    opts, MFOCRZoneRecognitionModeNoZoneRecognition, Nothing, True

    End If

    Next
Children
No Data