How to extract text using OCR
About this workflow
This workflow will show how to extract text from a pdf file using different OCR ( Optical character recognition ) engines.
First of all make sure you already have PDF Activity Package installed, otherwise click the Manage Packages button from the Activities panel. Select the PDF Activity Package and install it.
Note: If you have just installed the package please close the studio and open it again.
This automation has two activities:
Activity 1. The first one is Read PDF with OCR which uses different OCR engines like Google OCR, MS Office OCR or ABBYY OCR. This activity requests the path of the pdf file and outputs a string which we will use in the second activity, Write text file. This activity needs a string as an input and file name.
Once the workflow has been run, you will find the text document in the same folder as the XAML document.