Text Analytics detecting Payment Date: <date>

Hi all,

We are investigating Text analytics to pre-populate metadata when we import scanned documents. A lot of this information we have been able to successfully collect from the OCRed document. 

However we are having issue detecting the "Payment Date" that is usually on the document after the text Payment Date: 

See example below. The issue is that when these documents are scanned in, the date is in a separate block of text, as I've tried to indicate with the highlighting. 

Is what I'm trying to achieve possible? Or am I fighting a losing battle here?

As you can see there are multiple dates. I Can get it to suggest them all and the user has to select the correct one, but we are looking to automate all of this. 

Thanks in advance!

Adrian

Parents
  • You should be able to create a regex that looks for a string with the combination of "Payment Date:" and a date and then only use the date part of that string. The challenge will probably be that the string will vary depending the document source. If you can create a short list of typical strings, then you should be able to create a regex that matches those. It won't catch the correct date on documents where your regex doesn't match the actual text but at least you can make it work on the most common documents.

Reply
  • You should be able to create a regex that looks for a string with the combination of "Payment Date:" and a date and then only use the date part of that string. The challenge will probably be that the string will vary depending the document source. If you can create a short list of typical strings, then you should be able to create a regex that matches those. It won't catch the correct date on documents where your regex doesn't match the actual text but at least you can make it work on the most common documents.

Children