This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Information Extractor - HELP!

So, I'm trying to setup the infromation extractor to capture two values, Invoice Number and Invoice Value.

Currently this is just a proof of concept so I'm using a simple spreadsheet with the following data in it:

Invoice Number: 1234

Total: 20000

I have created two custom suggesiton rules, one for Invoice Number and one for Invoice Value. Each is very simple:

{
"targetName": "Invoice Number",
"documentContentPattern": "\\b(Invoice Number)(:|\\s?)(\\s)?(?<value>\\w+)\\b",
"enabled": true,
"confidence": 0.95
}

,
{
"targetName": "Invoice Value",
"documentContentPattern": "\\b(Total)(:|\\s?)(\\s)?(?<value>\\w+)\\b",
"enabled": true,
"confidence": 0.95
}

I have also created two Suggestions, one for Invoice Number, one for Invoice Value and set them to be specific property and selected the relevant property.

The only difference is Invoice Number is a Text property, Invoice Value is a Number (Real) property.

I have tried everything I can think of and I can not get M-Files to suggest a value to me for Invoice Value. I have tried replacing:

"\\b(Total)(:|\\s?)(\\s)?(?<value>\\w+)\\b"

with 

"\\b(Total)(:|\\s?)(\\s)?(?<value>\\d+)\\b"

So I'm specifically only looking for numbers and it does nothing.

If I change my "targetName": "Invoice Number" setting to look for numbers after the word Total rather than Invoice Number, it picks up my correct Invoice Value (20000) but is obvioulsy suggesting it as my invoice number, not invoice value. It picks it up if I use \\w+ or \\d+

But whatever I try to capture to my Invoice Value suggestion - nothing. It will nto suggest anything.

Now, the only difference as I said is the property type. Can we not capture metadata suggestions for Number properties?

I couldn't see this listed as an exclusion in the instruction document?

If anyone can help I would be most grateful, I am tearing my hair out here :(

  • OK, so I thought I would add an extra property to my Invoice Class called 'Invoice Value Text' and try to capture the value (20000) into here.

    This has worked, which makes me suspect that the Information Extractor cannot suggest metadata values for Numerical Properties.

    However, for the proof of concept I'm creating, I need the Invoice Value to be in a Number property because in a workflow, I want to be able to evaluate if the value is above or below a certain amount.

    Any ideas welcomed Slight smile

  • As you found out, Information extractor cannot suggest values for numeric properties. There is an improvement request in our system with ID 153060 about this. 

  • In the interim, one option would be to extract the data to a text property and then use some code (e.g. an event handler) to populate the numeric property that you need.

  • There is a way that you can get Text Analytics to suggest metadata for Invoice Value.

    First, ensure you've installed the most recent version of Information Extractor.  You can download it from the M-Files Solution Catalog.

    Second, try the following configuration for your customMapping (note the "normalizeAs" value):

    {
        "targetName": "Invoice Value",
        "documentContentPattern": "\\b(Total)(:|\\s?)(\\s)?(?<value>\\d+)\\b",
        "normalizeAs": "Real(en-US)",
        "enabled": true,
        "confidence": 0.95
    }

    This will be able to suggest 20000 for the Invoice Value property.

    Note that both the \d and \w character classes will not match on punctuation, so you will need to update your regular expression if you want to capture things such as decimals and/or thousand separators.

    I'll review the Information Extractor documentation to ensure that it explains how to configure suggestions for various property types.

  • Thanks Chris, this could be real helpful if we fully understand how to use "NormalizeAs". I tried copying your example into a setup where the property is configured as Integer. It did not seem to work.

    Would like to see some documentation on what can be used as variables connected to NormalizeAs.

  • Hi - we will get the documentation updated ASAP.  In the meantime, you can use "Integer(en-US)" for properties configured with the "Number (integer)" type.

  • That's brilliant thanks very much Chris.