I am attempting to define OCR for assigning properties to PDFs imported from an external file source in bulk. (The PDFs are quotations, containing customer information, quote number, date, sales rep name, and total amount.)
My questions are as follows:
1. If a PDF file has already been imported from an external source and scanned via OCR once, and is now registered in M-Files, is it possible to execute OCR on it again to find additional data? (For example, if I need to update it if the OCR had made a mistake or scanned the wrong region, and I have corrected the region position, etc., I don't want to delete it and import it again)
2. If the OCR picks up a certain word in the string it sees, is it possible to specify a value in a list based on that? For example, if it scans the region on the page and sees "Joe Smith", can it assign that to a property that is "select from list", if "Joe Smith" is in the list?
3. If the OCR scans something that is missing or garbled, can you flag it somehow? (for example, assign "True" to another field such as "Needs checking") I suppose this would require something on the server side to run?
4. Can you extract only numbers from an OCR scanned value? For example, if it sees "$49.95 + tax", can it extract just the "49.95" and assign that to a property?
5. How do you concatenate multiple OCR scanned lines into one line to assign it to a property?
6. Finally, in the future will there be some kind of graphical way to define OCR scan regions? (Currently I have to open the file in a graphics editor to measure for the region box)