This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

M-Files Text Analytics

Hi,

I am trying to configure the M-Files Text Analytics. The Subscription ID is present in my document (for example, Subscription ID: 919-23456789), and I'm attempting to analyse and add it to a metadata field which is text datatype.

Although I've tried many documentcontentpatterns, none of them have been able to analyse the subscription ID of the relevant documents.

If someone could help me with this, it would be greatly appreciated.

Here is the M-Files Text Analytics script:

{
"targetName": "Subscription ID",
"documentContentPattern": "Subscription ID:\\s*919([+-]?(?=\\.\\d|\\d)(?:\\d+)?(?:\\.?\\d*))(?:[Ee]([+-]?\\d+))?",
"comment": "Analyze the value after 'Subscription ID:' (For eg: 919-23568976) following the document pattern content.",
"enabled": true,
"forceSetValue": true,
}

BR,

Umesh

Parents
  • Something like this should capture the 8 digit part of the subscription ID following the dash:

    Subscription ID:\\s*919[+-]?(?<value>\\d{8})

    If you want to include the '919-' part you can just move '(?<value>' and make it like this:

    Subscription ID:\\s*(?<value>919[+-]?\\d{8})

  • Hi Karl

    One more question, I would also like to analyse "Quantity" and "Total Price Per Month" from the documents (Snip Attached). Is it possible to read the quantity and suggest in the metadata? 

    I tried the following Script to extract Total Price per Month but doesn't work, looks like there is some column issue but not sure what exactly wrong with this.

    "targetName": "Total Price Per month",
    "documentContentPattern": "Total Price per Month\\s+\\$[0-9]+\\,[0-9]+\\.[0-9]{1,2}",
    "comment": "Monthly total price of the license",
    "enabled": true,
    "forceSetValue": true,
    "confidence": 0.95

    Could you please let me know whether it is possible or not?

    Thank you so much.

    BR,

    Umesh

  • Hi Umesh,

    I should be possible. However, you need to include (?<value> [some code]   ) in your configuration to let M-Files know which part of the pattern you want to capture.

    I often open the source document and copy paste the relevant lines into Notepad or Notepad++ to see how the computer reads it. Sometimes you will find surprising line breaks and other special characters that you need to include in your pattern. It is a good idea to test your pattern with Expresso or similar tools, see https://ultrapico.com
    Remember to reduce the double backslashes when testing and add them again when configuring M-Files.

    BR, Karl

  • Hi Karl

    Thank you so much for your response.

    Now i can successfully be able to read and analyse "Total Price Per Month" in the metadata. But I am still struggling to read the quantity (Qty). When i copy and paste the relevant lines from the above table into Notepad it looks something like below:

    I would like to read and analyse the value 50 in a metadata card Quantity (Qty). Not sure whether it is possible or not?

    I really appreciate your assistance.

    BR,

    Umesh Pandey

  • Well, first of all it is a challenge that quantity occurs multiple times in the text. So which one is it you want to capture? How can M-Files select the right one?

    Once you have figure that out, you need to create a regex that will pick out the desired number.
    It looks like the number is always placed between a date and a $ character. So the regex might look something like this:

    \d{1,2}-\w{3}-\d4\s+(?<value>\d+)\s+$.+

    (one or two digits, -, three word characters, -, four digits, 1 or more white space, capture one or more digits, one or more white space, $, one or more any character)

  • Hi Karl

    I have applied the same script, but it is not working as it supposed to be. Is there anything that i need to consider extracting the quantity which is in middle of the text?

    Thanks

  • I absolutely love Expresso from Ultrapico!  I use it whenever I have to write more complex Regular Expressions!

  • Have you tried putting the expression and source text into Expresso (the tool that Karl suggested) and see what happens?

Reply Children
  • Hi Craig

    I hope you are doing great.

    Yes, I've tried using an expression in Expresso, and it appears that the script is correct and that the named capture group is being pointed correctly, although I didn't receive any suggestion in a metadata field (quantity).

    Let me know if you have any other way to extract the values. 

    Any suggestion will be highly appreciated.

    Thank you.