PDF Data Extraction and Automation
Extracting Information and Text from PDF Documents
First and foremost, make sure that the PDF activities pack is available in your Studio. (If not, download it from the Manage Packages window.)
Special activities are created, so that you can deal with both large text and specific elements in PDF files.
For full documents (or specified pages from it), you can use the Read PDF Text activity and output it as a string. From there on, its up to you what you want to do with the extracted text.
For text images within .pdf files, a special activity, Read PDF with OCR, is available. It outputs text as a string variable and enables you to use your preferred OCR engine (Abbyy, Microsoft or Google), by simply dropping it.
The screen scraping wizard also enables you to get text out of .pdf documents, but for more information, you should check the Advanced UI Automation tutorial.
The Get Text activity enables you to extract text from UI elements. You should have at least some general knowledge about selectors before continuing with this part of the tutorial.
With the Anchor Base activity (using Find Element or Image and Get Text in it) you can extract a fluctuating value from one or multiple .pdf files, as long as they have the same structure.