Tony Tzeng leads product development efforts for Document Understanding at UiPath.
Processing document data fast and accurately are vital to compete in a changing digital age. For today’s companies, success depends on the ability to easily locate, access, and understand document data. Document processing is a business-critical use case impacting productivity for every company, regardless of size, industry, or focus.
In this blog, I walk through the document processing evolution. I start by discussing digitization and optical character recognition (OCR). I then discuss how companies can extend OCR using artificial intelligence (AI)-powered document recognition to drive value through better document understanding capabilities.
Let's jump in.
Phase one: Turning offline data into online data with OCR
Traditional document processing practices are painful. Many companies still deal with challenges like incorrect labeling and time lost to manual data extraction that arise through non-digitized document processing.
Companies are turning to digitization to combat such challenges. According to a 2019 M-Files survey, 41% of respondents plan to focus on replacement of paper forms with electronic forms; 70% of respondents plan to expand document processing to more born-digital documents—compared to only 39% in 2018.
Businesses specializing in document processing have embraced digitization to help companies convert physical documents into a digital format. Core to these processes is OCR. OCR technology recognizes text within physical materials and images. OCR then transforms the text into digital files like PDFs.
Solutions using OCR are critical for helping to ease document processing woes. Yet, traditional OCR technology has its limitations.
Phase two: Moving beyond online data into 'intelligent OCR'
Let's say you take a picture of a document or scan a document into your system of choice. Now, classifying and extracting data depends on the quality of the image you've scanned. Why does this matter for document processing solutions using OCR?
OCR solutions are only as effective as the quality of the underlying document processed. Challenges arise when OCR software cannot distinguish between characters, such a '3' versus '8,' or 'O' versus 'D.' The very errors you want to avoid by using OCR software can become new headaches when OCR technology is incapable of analyzing the nuances of a document based on its quality or original form.
That’s where AI-powered document recognition comes into play.
As AI capabilities advance, companies have begun creating and training machine learning (ML) models to apply toward OCR. Model-based OCR engines, or what we call intelligent OCR, yield significant improvements for digitizing documents and text at scale while reducing errors.
Intelligent OCR helps companies digitize documents and images that before proved a challenge for legacy OCR systems, such as handwritten letters, checkboxes, and cross-outs.
We are only beginning to discover what’s possible when we extend OCR with AI. Let's walk through some of the possibilities and outcomes you can realize as you start to use model-based solutions for digitization and document processing.
Phase three: Using AI for better data extraction and document classification
Getting documents into a digital format is the first of many steps to derive value from the document itself. Once digitized, OCR software must understand the kind of document it's working with and what's relevant.
Companies using traditional OCR software can struggle to scale document classification efforts. Traditional OCR engines use simple approaches, like header identification, to classify document types. This type of approach can limit a company's ability to classify documents on a granular level.
Once documents are classified using a traditional OCR solution, companies are often confined to document templates, or predefined “recipes” for a digitized text used to specify relevant fields to extract, and “rules” for finding that field in the document. You can create rules based on recurring patterns in the data, a position within a document, or a position relative to something else that's easy to find in the document, such as a logo. While templates are a natural starting point, they’re static.
As document processing efforts scale, companies end up investing in template management and new template creation to deal with document variants not relevant within the initial implementation.
Taking advantage of AI in document classification and data extraction changes this dynamic to make processes easier.
Once you have your data in a digital format, you can use trained models to look deeper into documents to classify document types and extract relevant information in a structured manner.
Model-based OCR solutions can identify a document type and match it against a known document type used by your business. They can also parse and understand blocks of text in unstructured documents. Once the solution knows more about the document itself, it can begin to extract relevant information based on intent and meaning. And, it can deal with changes and variants in your documents.
Rather than creating templates, you can define the fields you want—the document's taxonomy—and then teach the ML model how to find these fields. The model is then able to adjust itself based on the incoming documents and learn from human validations of processed documents.
Having these capabilities creates greater flexibility and scalability in your document processing solution. The outputs also open new doors for what you can do with the data itself.
Phase four: Empowering new insights and action using AI
Using AI for document classification and data extraction is a massive step along the journey to empower your organization with automated and accurate document processing capabilities. As you look longer-term, you can begin to build out a roadmap to take advantage of AI capabilities and do more with the text you extract.
With AI, you can validate errors by referencing data from across multiple documents or from various backend systems. For example, let's say an invoice amount is incorrect, but it wasn't an error in the OCR process. To find the root of the problem, you can use a combination of robots to extract data across many document types and systems. This helps cross-check data and surface exceptions and errors generally outside of the domain of the OCR process itself.
You can also begin to apply AI capabilities to data sets over time and with historical context to make predictions and identify potential anomalies that may indicate fraud. Let's walk through an insurance claims processing example. The first step is to digitize an incoming claim. You then extract relevant information (such as claim date, nature, and amount) from the claim. Next, you can look at these data points and use an ML model to identify specific claims that may be fraudulent given variables like recurrences and suspicious amounts.
AI makes the execution of these types of tasks possible.
Taking the next steps toward document processing bliss
Document processing doesn't need to painful. Starting with OCR and extending OCR with AI can make document processing a more valuable—and less tedious—part of your process.
We’re passionate about helping clients use AI to simplify processes and make life easier.
Do you want to learn more about how we can help your company simplify and enhance its document processing practices and optimize document understanding with AI? Claim your complimentary copy of our white paper Increase Operational Efficiency and Mitigate Risks with Document Understanding.
Want to see these capabilities in action? Start your free trial of UiPath Enterprise Cloud.
About the author: Tzeng previously worked as a product lead at Microsoft, developing customer service virtual agent products. He holds degrees from The Wharton School and Stanford.
Director of Product, Document Understanding, UiPath