When RPA met AI: the Rise of Cognitive Automation

photo manequin

Editor's note: This post was published in 2018. Get up-to-date information on artificial intelligence (AI) and UiPath AI capabilities on our website.

Original post:

In the age of the fourth industrial revolution our customers and prospects are well aware of the fact that to survive, they need to digitize their operations rapidly. Traditionally, business process improvements were multi-year efforts and required an overhaul of enterprise business applications and workflow-based process orchestration. However, the last few years have seen a surge in Robotic Process Automation (RPA). The surge is due to RPA’s ability to rapidly drive the automation of business processes without disrupting existing enterprise applications. Today, it’s artificial intelligence’s (AI) turn to prove itself.

AI, so close and yet so far


Venu Kannan, Chief Solutions Officer UiPath

Typical use cases on AI in the enterprise range from front office to back office analytics applications. A recent study by McKinsey noted that customer service, sales and marketing, supply chain, and manufacturing are among the functions where AI can create the most incremental value. McKinsey predicts that AI can create a global annual profit in the range of $3.5 trillion to $5.8 trillion across the nine business functions and 19 industries studied in their research. Despite the tremendous potential of AI, the study also notes that only a few pioneering firms have adopted AI at scale. Key among the adoption limitations are the availability of massive data sets, generalized learning, regulation, and social acceptance due to potential bias in algorithms.Most applications of AI today are focused on narrowly defined tasks, such as predicting machine failure rates, text analytics for sentiment detection, or facial image recognition. The field of AI is continuing to make foundational advances towards human-level Artificial General Intelligence (AGI). AGI is the  fuzzy horizon beyond which a machine will be able to successfully perform any intellectual task that a human can. AGI tasks include learning, planning, and decision-making under uncertainty, communicating in natural language, making jokes or even… reprogramming itself.

The field of AI that is most relevant for UiPath is deep learning. Much of the recent boom in AI can be attributed to the application of deep neural networking over the past decade. On the one hand, convolutional neural networks – a specialized application of deep neural networks – are designed specifically for taking images as input and are effective for computer vision tasks, an area where UiPath invests heavily. On the other hand, recurrent neural networks are well suited to language problems. And they are also important in reinforcement learning since they enable the machine to keep track of where things are and what happened historically. In reinforcement learning, the machine learns from experience. It collects the training examples through trial-and-error as it attempts its task, with the goal of maximizing long-term reward.

Data, the main ingredient

Let’s look at how we can embed AI into RPA. Any enterprise process can be defined as consisting of the following sequence: Data -> Judgment -> Action. Leading companies are leveraging AI to make the most of all the data that is available to them by adding prediction as a step into the sequence, leading to: Data -> Prediction -> Judgment -> Action. Understanding the complexities and challenges of these steps will be critical to solving the AI/cognitive puzzle when it comes to enterprise automation. Each of these four steps consists of challenges that typically lead to increased manual activities.

What’s the main currency for companies nowadays? It's data. Companies like Google, Facebook, Microsoft have spent billions to build technologies to capture and store data. Prediction, the first step in enterprise automation involves securing important data and converting it into meaningful information or insights for further processing. This step also involves interaction with the customer (internal or external). A typical data step of a process includes data extraction, data transformation, and data cleansing. Data belonging to enterprises can be classified into structured data, unstructured data, and conversational data. In fact, the actual AI life-cycle for developing and deploying machine learning in principle is set up as follows:

Structured data

Machine understandable and query-able, structured data can nicely fit into a relational SQL database and can work well with basic algorithms. Most companies use structured data well. Automations of the downstream process that accepts structured data is easier and has a better success rate.

Unstructured data

Similar to spoken language, unstructured data is difficult or even impossible to interpret by algorithms. Most companies struggle to extract information from unstructured data, although the potential to achieve zero-touch operations lies in their ability to handle it. This class of data further consists of subgroups; unstructured images in document form, unstructured texts, unstructured images in picture form, unstructured audio, and unstructured video. Each of the subgroups might pose different challenges or possibly different technical solutions when it comes to extraction.

By 2020, over 90% of all data in the enterprise will be unstructured

Unstructured images (documents) require OCR/ICR capabilities to extract the data. If an image has a consistent format, such as payable invoices, payment remittance, etc., then these images can be converted using OCR/ICR technologies, and the output will be readily consumable by the downstream process. If the format is inconsistent, then OCR/ICR technologies will deliver unstructured text data, which needs further processing.

Unstructured text is another sub-group that requires natural language processing technologies (e.g., Intellidact, Instabase, etc.) to interpret the different attributes that are relevant to understanding the data namely context, entities, person, place, etc.

Unstructured images (pictures) are the type of input documents where a picture needs to be interpreted to extract information. For example, an engineering diagram of a building that needs to be converted into a bill of material rapidly due to the competitive nature of the bid process. Unstructured images require vision technologies to convert them into data.

Unstructured audio helps companies in particular scenarios, such as analyzing customer calls to understand satisfaction level. Finally, there are unstructured videos, with data inputs that are seldom used in companies, and where technology still has a lot of catching up to do to interpret them.

Enterprises have significantly more volume of unstructured data. Some predict that by the year 2020, over 90% of all data in the enterprise will be unstructured.

Decisions, decisions... 

The next step in cognitive automation is judgment. This step involves combining information with past trends and rules to decide on a course of action. It can be easily split into two types; rules-based judgment and trends-based judgment.

Rules-based judgment

Rules-based judgment involves decision making based on configurable rules. For example, a payable invoice is compliant if it has a set of key information present. These rules can quite easily be configured to deliver touch-free automation. Much of decision-making in an enterprise process is rules-based once all the data is available in a consistent format.

Trends-based judgment

This category involves decision-making based on past patterns, such as the decision to write-off short payments from customers. While many of the fuzzy decision-making in an enterprise process can be codified, special events (e.g., marketing campaigns, period-ends, cash position, etc.) can call for an intuition-based decision-making that can be learned through experience but cannot be documented as rules. Humans play a vital role in such areas in being fast and accountable.

While many of the trend-based judgment decisions will need human input, we see that AI will reduce the need for some processing exceptions by predicting the best decision. These predictions can be automated based on the confidence level or may need human-in-the-loop to improve the models when the confidence level does not meet the threshold for automation.

And... action!

The last step in the process involves taking an action based on the outcome of the first two steps. Action can be system-based e.g. automatic data transfer, data processing, or email communication, or it can be conversational e.g. presenting board pack in a meeting.

To deliver a truly end to end automation, UiPath will invest heavily across the data-to-action spectrum.

Where we’re heading

Our customers today leverage our product to perform rules-based automation which enables faster processing time and reduces error rates. However, most initiatives tied to RPA are tactical and are focused on cost-cutting.

They shouldn’t be. We support disruptive ways to transform business processes through the introduction of cognitive automation within our technology.

We’ve invested heavily in image recognition and will continue to do so by incorporating deep learning in our platform to enable the robots to understand any screen, similar to the way humans do. Our image recognition engine uses powerful algorithms that are optimized to find images on screen in under 100 milliseconds.

Emails, annual statements, contracts, or other types of documents hold data that needs to be extracted through keywords and logically organized for the robots to drive decisions accordingly. This can be achieved through machine learning, an area where UiPath invests heavily. We have already enabled Python development within our platform and will continue to enable machine learning models to be executed, maintained and managed in UiPath. Along the way, we will also support predictive decisioning technology which has self-learning capabilities.

Another key investment is related to language—spanning from natural language understanding to natural language generation. The business applications of the future will be less form-based and more interaction-based. This wave has already started in the contact center market. With 20% of the searches performed with mobile being voice-based, conversational interactions are set to become increasingly pervasive even in an enterprise context. UiPath tightly integrates cognitive technology from Stanford NLP, Microsoft, Google, and IBM Watson and has just announced a strategic partnership with Google Cloud Contact Center AI to deliver a no-touch center automation solution. There is a lot of excitement about how RPA can be used to automate more processes by discovering opportunities automatically. Today, we have a partnership with Celonis for process discovery. Concurrently, we are researching new possibilities to auto-generate process templates by studying in great detail the user-machine interaction and all of its traces in the system.

Of all these investments, some will be built within UiPath and others will be made available through tightly integrated partner technologies. To drive true digital transformation, you’ll need to find the right balance between the best technologies available. They are many and they are varied. But RPA can be the platform to introduce them one by one and manage them easily in one place.

If you have an interest in knowing more, feel free to get in touch with us.

Venu Kannan headshot
Venu Kannan

VP, Professional Services - Americas, UiPath

Get articles from automation experts in your inbox

Get articles from automation experts in your inbox

Sign up today and we'll email you the newest articles every week.

Thank you for subscribing!

Thank you for subscribing! Each week, we'll send the best automation blog posts straight to your inbox.