Technical Tuesday: Bridging traditional automation and agentic autonomy

Share at:

Bridging traditional automation and agentic autonomy

With the rise of agentic AI, organizations are asking what role traditional automation will have in the future. But this question misses the mark. Agentic AI won’t replace automation, it will make it even more valuable.

UiPath has always been ahead of the curve. Our early adoption of AI technologies like computer vision and document understanding proved that automation could be more than a set of rigid rules. It could see, think, and even reason. That same mindset drives our current vision around agentic automation: combining AI agents, robots, and people to enable long-running, adaptive workflows with real-time decisioning.

A spectrum of control, risk, and agency

In every AI capability, a trade-off has always been made between agency and reliability. Agentic AI is no different and, due to its underlying model architecture, high agency is usually prioritized. This makes AI agents ideal for tasks with lots of uncertainty and the requirement to change and adapt on the fly. However, businesses need different levels of control for all kinds of different tasks.

This need for granularity and customization informs our own approach to agentic UI automation:

UiPath agentic UI automation diagram

Agentic UI automation is a category of agentic automation that focuses on using agentic AI to perform UI-based tasks. The UiPath Platform™ provides access to agentic UI automation through different layers, giving you the flexibility to choose a solution that addresses your business need with the right balance of agency and risk:

  • At the selector level: UiPath semantic selectors help robots better identify UI elements as part of your automation. This happens at the micro-task level but covers scenarios where traditional selectors fall short, allowing you to benefit from agentic AI with minimal risk and maximum predictability.

  • At the web form level: Similar to semantic selectors, UiPath semantic activities use agentic AI to work with UI elements at the micro-task level. However, they handle more complex and dynamic scenarios and are specifically designed for working with web forms. Semantic activities bring greater resiliency and adaptability to your automations while deploying agentic AI with limited, controlled autonomy.

  • At the single-app context level: UiPath ScreenPlay leverages large action models (LAMs) to turn your instructions into real, on-screen actions. Unlike the micro-task solutions mentioned above, ScreenPlay operates across an entire application with greater agency and can perform tasks previously out of reach for traditional automation.

  • At the multi-app context level: the final layer of our agentic UI offering will be an attended desktop agent (currently in development). It acts as a digital assistant and can autonomously execute tasks across multiple apps and systems.

DOM extraction: the core engine behind our agentic offering

Before diving into the details of each of the layers of our agentic offering, let's talk about the AI-powered document object model (DOM) extraction engine that sits at the heart of all our agentic UI automation capabilities.

Most of the recent work on computer use is focused on using screenshots to observe the target application’s state and content. While it’s a fair approach that tries to stay as close as possible to the human way of observing the environment (including screens), it can lack a few key advantages typical to extracting a curated DOM:

  • The ability to see an entire scrollable webpage—rather than its current viewport only—combined with slow, frustrating scrolling actions. DOM-based approaches provide not only more reliable and accurate data extraction, but also faster processing speed. In the DOM-based approach, all the content is extracted once at run-time.

  • The ability to use hidden, useful data on on-screen UI elements, such as a hidden text label that describes an icon.

That's why we’ve heavily invested in building our own custom AI-powered DOM extraction engine. It powers semantic selectors, semantic activities, ScreenPlay, as well as our Clipboard AI capability (recognized as one of TIME's Best Inventions of 2023). The Clipboard AI capability is available as part of UiPath Autopilot.

The best part about DOM extraction is that it works in conjunction with image-based understanding (using AI Computer Vision internally and another dedicated AI model for building the target-anchor pairs), so it doesn’t miss out on relevant screen understanding clues that pure-DOM extractors might miss.

Micro agents: where the agentic story begins

Agentic automation doesn’t have to be all-or-nothing. As mentioned earlier, different business needs call for different levels of agentic solutions. And sometimes that means leveraging agentic AI for small micro-level tasks. That's why we introduced the concept of micro agents as the intelligent stepping stones between the more traditional automation activities and full-blown autonomous agents.

Semantic selectors and semantic activities (like Fill Form, Extract Form Data and Update UI Element, formerly known as Set Value) can be defined as micro agents. They are designed for specific, high-precision tasks and enable you to benefit from:

  • Task-specific intelligence

  • Precise semantic matching, powered by generative AI

  • Predictable execution, enabled by layered fallback strategies

They are not micro agents in name only. These tools exhibit constrained autonomy: reliable, bounded, and robust. Rather than limiting agentic potential, micro agents enable it, laying the groundwork for higher levels of autonomy while ensuring enterprise-grade stability.

Let's take a closer look at how semantic selectors and semantic activities bring the vision of micro agents to life.

Semantic selectors: targeting made resilient

Built on top of our custom DOM extraction engine and AI reasoning engines like GPT, semantic selectors elevate how automation identifies UI elements. Traditional selectors, while fast, are fragile. Semantic selectors overcome this by allowing developers (or an AI assistant like UiPath Autopilot™) to describe UI targets in plain language: "the button that submits the form".

At runtime, the system intelligently determines which fallback layer to use: strict and fuzzy selectors first, then semantic selectors, and then computer vision. This best-of-both-worlds strategy delivers both low-latency execution and high resilience.

Semantic activities: intelligent form-focused data extraction and input

Semantic activities are specifically designed for working with forms, offering a direct and powerful method for data extraction and data input. They use semantic matching, semantic execution, and DOM extractor-based interactions to handle dynamic scenarios where screen elements might change and otherwise break execution. They can very easily be used to transfer data between webforms of various kinds, greatly simplifying any form-filling or form extraction scenario.

From micro to macro scale: UiPath ScreenPlay

We've applied the same principles behind semantic selectors and semantic activities to UiPath ScreenPlay. ScreenPlay is an agent that brings intelligent automation directly to the user's desktop. Unlike task-specific micro agents, ScreenPlay operates across an entire application. It understands natural language goals, like "find the invoice from last month and download it” and autonomously navigates interfaces the way humans do to execute the needed actions. It can also handle input/output and monitor UI state.

For now, ScreenPlay is scoped to a single app or URL, ensuring high reliability. But it opens the door to broader agentic execution patterns without sacrificing predictability. ScreenPlay is all about unlocking the true potential of large action models (LAMs).

Attended desktop agent: a true computer use capability

And we’re not stopping there. Our team is hard at work creating a next-gen attended desktop agent that will operate as an autonomous digital assistant across multiple applications on your machine. Think of it as an entity capable of understanding your workspace context and acting accordingly in an attended environment.

Our vision for the desktop agent isn't about discarding what made traditional automation great. It's about enhancing it with layers of intelligence that make automation more flexible, resilient, and ultimately more helpful. Stay tuned for more updates about the desktop agent.

Automation just got a lot smarter

To sum up, agentic UI automation isn’t a replacement for traditional automation, it’s an evolution. From the early days of computer vision to today’s layered agentic automation stack, we’ve consistently delivered pragmatic, resilient, and intelligent automation.

If automation was about getting repetitive tasks done, agentic UI automation is focused on handling tasks that aren’t solvable by repetitive actions, those requiring a high degree of adaptability, autonomy, intelligence, and dependence on variable input factors. And that journey starts with trust, predictability, and a gradual path toward agency; all things that we've built into the DNA of the UiPath Platform.

You don’t need to throw the most powerful tool at every problem; some are simple enough to solve with more cost-effective methods. Instead, you want to optimize what you’re using for every situation. Therefore, you need the flexibility of a platform that can cover all use cases. You might need low risk and high reliability for critical, repetitive processes, while still introducing intelligence and autonomy where it adds value to your outcomes. It’s a philosophy that echoes the grocery store REMA 1000's marketing principle: "simplicity is king."

With micro agents like semantic activities and semantic selectors at the base, ScreenPlay as the next evolution you can use today, and our upcoming attended desktop agent, the UiPath agentic stack empowers you to scale automation confidently and intelligently.

Want to get started? Join the UiPath Insider program to try out ScreenPlay and get early access to new features and products.

Bogdan Sultana
Bogdan Sultana

Senior Product Manager , UiPath

Get articles from automation experts in your inbox

Subscribe
Get articles from automation experts in your inbox

Sign up today and we'll email you the newest articles every week.

Thank you for subscribing!

Thank you for subscribing! Each week, we'll send the best automation blog posts straight to your inbox.

Ask AI about...Ask AI...