This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

M-Files Text Analytics

Hi,

I am trying to configure the M-Files Text Analytics. The Subscription ID is present in my document (for example, Subscription ID: 919-23456789), and I'm attempting to analyse and add it to a metadata field which is text datatype.

Although I've tried many documentcontentpatterns, none of them have been able to analyse the subscription ID of the relevant documents.

If someone could help me with this, it would be greatly appreciated.

Here is the M-Files Text Analytics script:

{
"targetName": "Subscription ID",
"documentContentPattern": "Subscription ID:\\s*919([+-]?(?=\\.\\d|\\d)(?:\\d+)?(?:\\.?\\d*))(?:[Ee]([+-]?\\d+))?",
"comment": "Analyze the value after 'Subscription ID:' (For eg: 919-23568976) following the document pattern content.",
"enabled": true,
"forceSetValue": true,
}

BR,

Umesh

Parents
  • Something like this should capture the 8 digit part of the subscription ID following the dash:

    Subscription ID:\\s*919[+-]?(?<value>\\d{8})

    If you want to include the '919-' part you can just move '(?<value>' and make it like this:

    Subscription ID:\\s*(?<value>919[+-]?\\d{8})

  • Hi Karl

    One more question, I would also like to analyse "Quantity" and "Total Price Per Month" from the documents (Snip Attached). Is it possible to read the quantity and suggest in the metadata? 

    I tried the following Script to extract Total Price per Month but doesn't work, looks like there is some column issue but not sure what exactly wrong with this.

    "targetName": "Total Price Per month",
    "documentContentPattern": "Total Price per Month\\s+\\$[0-9]+\\,[0-9]+\\.[0-9]{1,2}",
    "comment": "Monthly total price of the license",
    "enabled": true,
    "forceSetValue": true,
    "confidence": 0.95

    Could you please let me know whether it is possible or not?

    Thank you so much.

    BR,

    Umesh

  • Hi Umesh,

    I should be possible. However, you need to include (?<value> [some code]   ) in your configuration to let M-Files know which part of the pattern you want to capture.

    I often open the source document and copy paste the relevant lines into Notepad or Notepad++ to see how the computer reads it. Sometimes you will find surprising line breaks and other special characters that you need to include in your pattern. It is a good idea to test your pattern with Expresso or similar tools, see https://ultrapico.com
    Remember to reduce the double backslashes when testing and add them again when configuring M-Files.

    BR, Karl

  • Hi Karl

    Thank you so much for your response.

    Now i can successfully be able to read and analyse "Total Price Per Month" in the metadata. But I am still struggling to read the quantity (Qty). When i copy and paste the relevant lines from the above table into Notepad it looks something like below:

    I would like to read and analyse the value 50 in a metadata card Quantity (Qty). Not sure whether it is possible or not?

    I really appreciate your assistance.

    BR,

    Umesh Pandey

  • Well, first of all it is a challenge that quantity occurs multiple times in the text. So which one is it you want to capture? How can M-Files select the right one?

    Once you have figure that out, you need to create a regex that will pick out the desired number.
    It looks like the number is always placed between a date and a $ character. So the regex might look something like this:

    \d{1,2}-\w{3}-\d4\s+(?<value>\d+)\s+$.+

    (one or two digits, -, three word characters, -, four digits, 1 or more white space, capture one or more digits, one or more white space, $, one or more any character)

Reply
  • Well, first of all it is a challenge that quantity occurs multiple times in the text. So which one is it you want to capture? How can M-Files select the right one?

    Once you have figure that out, you need to create a regex that will pick out the desired number.
    It looks like the number is always placed between a date and a $ character. So the regex might look something like this:

    \d{1,2}-\w{3}-\d4\s+(?<value>\d+)\s+$.+

    (one or two digits, -, three word characters, -, four digits, 1 or more white space, capture one or more digits, one or more white space, $, one or more any character)

Children
  • Hi Karl

    I have applied the same script, but it is not working as it supposed to be. Is there anything that i need to consider extracting the quantity which is in middle of the text?

    Thanks

  • Have you tried putting the expression and source text into Expresso (the tool that Karl suggested) and see what happens?

  • Hi Craig

    I hope you are doing great.

    Yes, I've tried using an expression in Expresso, and it appears that the script is correct and that the named capture group is being pointed correctly, although I didn't receive any suggestion in a metadata field (quantity).

    Let me know if you have any other way to extract the values. 

    Any suggestion will be highly appreciated.

    Thank you.

  • Just noticed that I have forgotten curly brackets around the number of digits in the year part. It should be:

    \d{1,2}-\w{3}-\d{4}\s+(?<value>\d+)\s+$.+

    Have you found a solution to pick which of the results you want to capture? Your input has multiple instances with quantity.

  • Hi Karl

    As I am looking to capture all the quantity field that are listed in the table. I am not sure if that is something that text analytics can analyse. I have also tried the above script with curly brackets on but no luck.

  • Yeah, well I also forgot to put a backslash in front of the $ character. I had forgotten that $ has a special meaning in regex.

    \d{1,2}-\w{3}-\d{4}\s+(?<value>\d+)\s+\$.+

    This regex has been tested in Expresso, and it will provide two results from your example above. Not sure how M-Files will respond to that.

    Please note the function in Expresso where you can type or paste a sample text and then run a match to see if your regex produces the desired results.
    There are plenty of good reference sheets for regex on the web. Print one out and keep it at hand when experimenting with regex. It can be rather tricky at first!

  • Hi Karl

    Finally, I got a desired quantity from the documents.

    Thank you so much for your suggestions. I would definitely get a reference sheet for regex before i experiment other documents.

    I really appreciate your help mate.