George Roth is senior manager of technology partnerships at UiPath.
Document understanding aims to unleash data trapped in documents to grant your organization much higher accuracy of the extracted data, increased productivity, and growing ROI from Robotic Process Automation (RPA). It lies at the intersection of document processing and artificial intelligence (AI) which together contribute to a future where almost everything can be automated.
The document understanding ecosystem includes technologies that can interpret information and meaning from a wide range of document types—even handwriting, checkboxes, and stamps. Machine learning (ML) is spurring continuous innovation in document understanding which is one of the fastest-growing areas of automation.
Organizations may already be working with specific solutions or providers. Still, they may need other technologies or new expertise to expand document understanding to other business functions. It can be hard to find a vendor who has a universal solution that works with all kinds of documents. They typically focus on particular types of documents or industries, such as insurance, finance, and healthcare. There are also vendors who offer ML-based solutions like pre-trained models for specific documents. Yet, those models cannot be easily modified to fit documents outside of those domains.
It’s currently difficult to find a vendor offering a universal solution that will work with any document imaginable. In turn, UiPath offers businesses a way to address the challenges and possibilities of automated document processing. To begin with, there are our native AI capabilities that you can try out through the UiPath Enterprise trial.
UiPath Enterprise RPA Platform capabilities are enhanced by complementary partner offerings allowing smooth end-to-end document processing – those are available in the UiPath Connect Marketplace. The Marketplace offers an open ecosystem with partner solutions that, combined with the UiPath RPA Platform, can address a wide range of use cases.
Let’s take a closer look at these ecosystem technologies and top vendors providing them.
Multiple technologies can unlock the power of document understanding
These are some of the most commonly used technologies in document understanding, along with UiPath partners who build solutions around them:
Optical character recognition (OCR)
OCR converts images of typed, handwritten, or printed text into machine-encoded text that can be further processed to extract desired data. The technology usually extracts information about the layout and structure of the content as well. You may have been occasionally slowed down by working with PDF documents in which you cannot copy text or apply search because PDF pages are basically pictures. Similarly, you may have a scan, a photo, or a screenshot of a receipt, for example, in typical graphic formats like JPEG or TIFF. OCR can easily collect all needed information from these files with no need for a human to read through all the documents on their own.
Many of the best-known OCR engines on the market are integrated with UiPath. These include ABBYY FineReader, Tesseract (an open source OCR provided by Google), Kofax OmniPage, Microsoft OCR, and Google OCR. Additionally, UiPath Document OCR has recently been released as another great choice for customers.
Template-based extractors (TBEs)
TBEs extract data using fixed rules that are applied to templates created by a user or a machine. TBEs may not work for documents where the structure of which changes frequently or requires different template variations. This means it’s not an option when you work with many different organizations and deal with various templates of invoices or receipts they send. At the same time, the technology is ideally suited for managing a relatively small number of templates for stable documents. Don’t hesitate to go for it when you have a pre-defined set of fixed templates and no exclusions are expected. When a document format change is necessary, it’s easy to manually change the template.
There are many vendors offering TBEs. When evaluating which solution to choose, you should pay attention to how easy it is to set up a template, and how the extraction results depend on the quality of the image. Some of the best companies offer technologies that create the templates in a semi-automatic way using a human-in-the-loop process that only confirms the choice.
A great TBE example is ABBYY FlexiCapture, which is integrated into UiPath Studio. There is also a UiPath template extractor that is available as part of UiPath Document Understanding.
Supervised-learning-based machine learning extractors (SMLEs)
SMLEs can be used for structured and semi-structured documents. The latter may not have a strict layout like structured documents but could include similar contents. A good example is invoices and purchase orders. SMLEs work by labeling a set of sample documents, i.e. associating the data elements to be extracted with the area in the document from where the data is extracted.
Currently, UiPath has ML-based extractors for invoices, receipts, and purchase orders. Other pre-trained models will be available soon. UiPath is also integrated with ABBYY Flexicapture Distributed and Flexicapture for Invoices which leverage pre-trained ML models for invoices and similar documents. Further, UiPath is integrated with Hyperscience, Ephesoft, Vidado, Rossum, Omnius, Microsoft Form Recognizer, and Amazon Textract. All integrations offer techniques for structured and semi-structured documents.
When considering SMLE options, ask the vendor how many samples are required to train the models. If the number is large, the process could entail a high cost due to the labeling tasks and the need for lots of samples.
Unsupervised learning (USL)
This technique consists of analyzing a data set without requiring pre-labeling data. USL utilizes pre-trained models or different computer-friendly knowledge representations to process unstructured documents. Common use cases include analyzing financial statements, contracts, and emails.
UiPath has several partners offering USL solutions, including Indico, SortSpoke, Botminds AI Technologies, and Xtracta. Indico, for example, offers a computer-assisted labeling tool that suggests labels associated with data in the documents. All the user needs to do is approve or overwrite them.
Natural language processing (NLP)
NLP technologies help computers understand human language. NLP is often combined with other technologies to perform a range of tasks. It allows organizations to execute text analysis, entity extraction, and automate processes by defining intent in unstructured documents like emails. If you want to extract the Begin Date and Finish Date from an unstructured document, you have to be able to map the work timeline because many dates are synonyms. NLP helps you do this as it can determine and analyze synonyms. Moreover, it may be analyzing the sentiment of a text—in other words, defining if it’s positive, negative, or neutral. This may be especially valuable for interpreting the content in news, social media, or correspondence. NLP partners and technologies integrated with UiPath include Expert System, Amazon Comprehend, and the Stanford NLP Group.
Emerging alternatives—business process outsourcing and human in the loop
Along with the established technologies and companies listed above, there’s an emergence of vendors offering business process outsourcing (BPO) and human-in-the-loop (HITL) processes to enhance document understanding.
For example, Ocrolus and Contract Wrangler have powerful ML-based technologies for document understanding. Yet, they engage a crowdsourced human force that helps correct document extraction results that don’t meet the desired accuracy threshold. Both companies are disruptive because they guarantee up to 99.99% accuracy and a time commitment to delivery. Of course, higher accuracy and shorter time requirements may entail higher costs for customers.
Additionally, UiPath Document Understanding solution provides the Validation Station. This tool lets users review and, if necessary, correct document classification and automatic data extraction results.
Final thoughts on choosing a solution
Choosing a solution that meets all your business needs for document understanding can pose a big challenge. It usually leads to evaluating options for implementing a few solutions simultaneously and looking for the best ways to integrate them. This is why UiPath works and integrates with a wide range of industry-leading vendors. We’ve established a rich document understanding ecosystem that complements the UiPath RPA Platform.
For more in-depth information, join our webinar Product Spotlight: AI-Enhanced Automations – Combining Transformative Capabilities. You’ll see how UiPath Document Understanding and other ML-based solutions can help take your automations to a whole new level—powered by AI. You can also try out these capabilities to help automate your business processes by registering for the UiPath Enterprise trial.