This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Text Analytics detecting Payment Date: <date>

Hi all,

We are investigating Text analytics to pre-populate metadata when we import scanned documents. A lot of this information we have been able to successfully collect from the OCRed document. 

However we are having issue detecting the "Payment Date" that is usually on the document after the text Payment Date: 

See example below. The issue is that when these documents are scanned in, the date is in a separate block of text, as I've tried to indicate with the highlighting. 

Is what I'm trying to achieve possible? Or am I fighting a losing battle here?

As you can see there are multiple dates. I Can get it to suggest them all and the user has to select the correct one, but we are looking to automate all of this. 

Thanks in advance!

Adrian

Parents
  • You should be able to create a regex that looks for a string with the combination of "Payment Date:" and a date and then only use the date part of that string. The challenge will probably be that the string will vary depending the document source. If you can create a short list of typical strings, then you should be able to create a regex that matches those. It won't catch the correct date on documents where your regex doesn't match the actual text but at least you can make it work on the most common documents.

  • Hi, 

    I initially had a regex that looked for Payment Date: and then retrieved the value of the date. But the issue is that the string "Payment Date:" and the actual date appear to be in different block of text when OCRed.

  • Yes, that can be challenge! I often open the PDF and then copy the relevant part of the text into NotePad to see how the computer reads it. The result can be surprising, but still it is useful when configuring the regex. The key is to isolate a pattern that will identify the desired date.

  • Thanks! 

    so looking at my screen shot before, it isn't really going to be possible to detect the Payment date, as it is a different block of text, and if I included a wild card between "Payment Date:" and the date, I will detect the record date instead?

Reply Children
  • That is not necessarily the case. You cannot determine it from the screen shot. The point is, that you need to copy/paste that section of the document into a text editor to see how the computer reads it. Perhaps the "Payment date:" will show up after the actual date rather than before as you would expect. In that case you need to create a regex that looks before "Payment Date" - not after it. Or perhaps there is another unique pattern that you can use to select the desired date. It is not trivial but often it is possible somehow.