This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Information Extractor - HELP!

So, I'm trying to setup the infromation extractor to capture two values, Invoice Number and Invoice Value.

Currently this is just a proof of concept so I'm using a simple spreadsheet with the following data in it:

Invoice Number: 1234

Total: 20000

I have created two custom suggesiton rules, one for Invoice Number and one for Invoice Value. Each is very simple:

{
"targetName": "Invoice Number",
"documentContentPattern": "\\b(Invoice Number)(:|\\s?)(\\s)?(?<value>\\w+)\\b",
"enabled": true,
"confidence": 0.95
}

,
{
"targetName": "Invoice Value",
"documentContentPattern": "\\b(Total)(:|\\s?)(\\s)?(?<value>\\w+)\\b",
"enabled": true,
"confidence": 0.95
}

I have also created two Suggestions, one for Invoice Number, one for Invoice Value and set them to be specific property and selected the relevant property.

The only difference is Invoice Number is a Text property, Invoice Value is a Number (Real) property.

I have tried everything I can think of and I can not get M-Files to suggest a value to me for Invoice Value. I have tried replacing:

"\\b(Total)(:|\\s?)(\\s)?(?<value>\\w+)\\b"

with 

"\\b(Total)(:|\\s?)(\\s)?(?<value>\\d+)\\b"

So I'm specifically only looking for numbers and it does nothing.

If I change my "targetName": "Invoice Number" setting to look for numbers after the word Total rather than Invoice Number, it picks up my correct Invoice Value (20000) but is obvioulsy suggesting it as my invoice number, not invoice value. It picks it up if I use \\w+ or \\d+

But whatever I try to capture to my Invoice Value suggestion - nothing. It will nto suggest anything.

Now, the only difference as I said is the property type. Can we not capture metadata suggestions for Number properties?

I couldn't see this listed as an exclusion in the instruction document?

If anyone can help I would be most grateful, I am tearing my hair out here :(

Parents
  • There is a way that you can get Text Analytics to suggest metadata for Invoice Value.

    First, ensure you've installed the most recent version of Information Extractor.  You can download it from the M-Files Solution Catalog.

    Second, try the following configuration for your customMapping (note the "normalizeAs" value):

    {
        "targetName": "Invoice Value",
        "documentContentPattern": "\\b(Total)(:|\\s?)(\\s)?(?<value>\\d+)\\b",
        "normalizeAs": "Real(en-US)",
        "enabled": true,
        "confidence": 0.95
    }

    This will be able to suggest 20000 for the Invoice Value property.

    Note that both the \d and \w character classes will not match on punctuation, so you will need to update your regular expression if you want to capture things such as decimals and/or thousand separators.

    I'll review the Information Extractor documentation to ensure that it explains how to configure suggestions for various property types.

  • Thanks Chris, this could be real helpful if we fully understand how to use "NormalizeAs". I tried copying your example into a setup where the property is configured as Integer. It did not seem to work.

    Would like to see some documentation on what can be used as variables connected to NormalizeAs.

Reply
  • Thanks Chris, this could be real helpful if we fully understand how to use "NormalizeAs". I tried copying your example into a setup where the property is configured as Integer. It did not seem to work.

    Would like to see some documentation on what can be used as variables connected to NormalizeAs.

Children