Text Analytics detecting Payment Date: <date>

Hi all,

We are investigating Text analytics to pre-populate metadata when we import scanned documents. A lot of this information we have been able to successfully collect from the OCRed document. 

However we are having issue detecting the "Payment Date" that is usually on the document after the text Payment Date: 

See example below. The issue is that when these documents are scanned in, the date is in a separate block of text, as I've tried to indicate with the highlighting. 

Is what I'm trying to achieve possible? Or am I fighting a losing battle here?

As you can see there are multiple dates. I Can get it to suggest them all and the user has to select the correct one, but we are looking to automate all of this. 

Thanks in advance!


Parents Reply Children
  • That is not necessarily the case. You cannot determine it from the screen shot. The point is, that you need to copy/paste that section of the document into a text editor to see how the computer reads it. Perhaps the "Payment date:" will show up after the actual date rather than before as you would expect. In that case you need to create a regex that looks before "Payment Date" - not after it. Or perhaps there is another unique pattern that you can use to select the desired date. It is not trivial but often it is possible somehow.