UPDATE: This blog was originally published in February 2019 to announce the public preview. We're excited to share that UiPath AI Computer Vision is now publicly available. You can learn more on our AI Computer Vision page.
One of our driving tenants on the Artificial Intelligence (AI) team at UiPath is something we call “Pragmatic AI” – teaching our Robots AI skills to solve complex problems for our customers in the most effective way.
Reliably automating Virtual Desktop Environments (VDIs) such as Citrix, VMware, VNC, and Windows Remote Desktop has always been a tough nut to crack in Robotic Process Automation (RPA). There are hundreds of thousands of businesses globally using VDIs and virtualization in enterprises is growing by the day.
Finding simple solutions to complex problems is certainly not an easy task. Good things come to those who wait, so let me just say I’m super excited to announce the public preview of what we believe is a true breakthrough for the RPA industry: the new UiPath AI Computer Vision capability built on deep learning.
The specific challenge when trying to automate VDI environments is RPA’s traditional reliance on selectors. These selectors work using the underlying properties of user interface (UI) elements and work great for identifying application elements (such as buttons, text-fields, etc.) when automating native desktop systems. However, this method completely breaks down when trying to automate the same software in a VDI environment.
The reason for the breakdown is that VDIs stream an image of the remote desktop, similar to how video-streaming services like Netflix do. There are simply no selectors to be identified in “video.”
Attempts to solve this challenge have used optical character recognition (OCR) and image matching, but even those attempts have led to reliability and maintenance issues, because even minor changes in the UI break the automations.
There has simply been no solution available in the market to enable effective automation of VDI environments. Until now.
UiPath solves the challenges discussed above with an AI Computer Vision algorithm that enables human-like recognition of user interfaces, using a mix of AI, OCR , text fuzzy-matching, and an anchoring system to tie it all together.
This allows our Robots to “see” the screen and visually identify all the elements, rather than relying on their hidden properties, IDs, and other metadata.
In fact, this new AI Computer Vision capability isn’t just limited to VDI environments. It can also recognize elements across a wide range of cases where traditional UI automation methods struggle, including SAP, Flash, Silverlight, PDFs, and even images.
Unlike traditional image automation, our AI Computer Vision does not rely on image matching. As a result, it’s highly resilient to interface changes including color, font, size, and resolution changes. The AI Computer Vision handles all these changes at once and still finds the intended target.
See a demo of the new AI Computer Vision in action:
Granted, this whole technology is still in its infancy, and we have big plans for it. Throughout the year we’ll add a few more usability improvements to this current version, with support for recording full automations using AI Computer Vision, then (and we’re really excited about this) in V2 we’ll bring a whole new level of capability and robustness.
We also have a kind request, because with your help, we can make AI Computer Vision both faster and better: please use the Report functionality in the wizard to alert us to any gaps. It's the best way to make it smarter and better for your needs.
Check out these additional resources to learn more: