Resources

Knowledge Base

How to extract text using OCR

About this workflow

This workflow will show how to extract text from a pdf file using different OCR ( Optical character recognition ) engines. 

2015-08-19_1722

Package installation

First of all make sure you already have PDF Activity Package installed, otherwise click the Manage Packages button from the Activities panel. Select the PDF Activity Package and install it.

2015-08-19_1726

Note: If you have just installed the package please close the studio and open it again.

 

Process automation

This automation has two activities:

Activity 1. The first one is Read PDF with OCR which uses different OCR engines like Google OCR, MS Office OCR or ABBYY OCR. This activity requests the path of the pdf file and outputs a string which we will use in the second activity, Write text file. This activity needs a string as an input and file name.

Once the workflow has been run, you will find the text document in the same folder as the XAML document.

Download workflow example