UI agents: Unlocking the true potential of large action models

Community events

Academy

Forum

Community Blogs

Tutorials

UI agents: Unlocking the true potential of large action models

Cosmin Voicu

•May 20, 2025

Introduction

A significant shift is happening in automation. If you've been following the tech space lately, you've likely heard the buzz about Large Action Models (LAMs) and how they're supposedly going to change everything about how we automate work. Companies like Anthropic with ComputerUse and OpenAI with Operator are making headlines with their ability to control computers using natural language instructions.

And honestly? The hype isn't entirely misplaced. This technology is genuinely impressive.

But there's a gap between ‌‌public perception and reality that needs addressing. Today, we're pulling back the curtain on UI Agents – what they are, what they can (and can't) do, and why our approach might be the only practical way to harness their power in enterprise environments right now.

What are UI agents?

UI Agents are powered by Large Action Models (LAMs), AI systems trained to use computers the way people do, with a mouse and keyboard. What makes them revolutionary is that instead of relying on hardcoded steps and traditional selectors like traditional RPA, they operate based on goals and natural language instructions.

Instead of programming each specific step to navigate to a folder, scroll down, click on a file, and so on, you can simply specify "Find the invoice from last month and download it" as the goal. These agents can "see" the screen, understand context, and adapt to changes in interfaces that would break traditional automations. They bring a whole new level of flexibility and intelligence to automation that wasn't possible before.

The promise: Better, faster, easier, more

The potential upside of this technology is enormous:

More resilient automations: When UIs get updated or elements move around dynamically, UI Agents can adapt without breaking.
Lower development barrier: Creating automations becomes significantly easier, requiring less technical expertise.
Enables previously unfeasible tasks: Automate the same basic operations across systems with varying interfaces—like entering identical data into hundreds of different websites—that would require custom development for each site with traditional RPA.
New cognitive capabilities: Agents can make in-context decisions, navigate websites based on semantic criteria, extract insights from multiple sources, or simply operate interfaces without ever seeing them before.
Cross-platform compatibility: Works across operating systems, increasing the automation coverage of your company.

All of this adds up to a lower total cost of ownership for automations and opens the door to scenarios that were previously unfeasible or prohibitively expensive to automate.

The reality check

The promise is real, but so are the limitations. As exciting as this technology is, the current generation of LAMs faces several challenges:

They can be slow compared to traditional RPA
They're expensive to run
They can be unpredictable on longer, complex scenarios

It's worth noting that all current models on the market are essentially first-generation (V1) implementations. There are no V2 models out there yet. The technology is still in its early stages, and even when successful, these models have relatively low reliability on longer scenarios. If you run the same prompt multiple times, they won't succeed 100% of the time, and they won't run consistently.

That said, the pace of improvement is remarkable. What seems challenging today may be routine in six months, these models are evolving rapidly. But for now, they're comparable to skilled assistants with unique strengths and clear limitations. Impressive but not yet ready to run your entire business unsupervised.

Asking an LAM to "buy me a plane ticket to New York" might work occasionally in a demo, but run it a thousand times in production, and you'll see failures that would be unacceptable in an enterprise environment.

Finding the sweet spot

So how do we harness the power of this technology while working around its current limitations?

The answer is micro-tasks.

After extensive testing, we've discovered that LAMs truly shine when handling very short sequences of 1–5 steps. A couple of clicks, a few typing actions. That's where reliability starts to approach enterprise standards.

Examples of excellent micro-tasks for UI Agents:

Locating and clicking a specific button that may move position based on dynamic content
Filling out a form field with contextually appropriate information
Extracting specific data from a table based on semantic criteria

This might seem limiting at first glance, but there's a powerful solution: combining these micro-tasks with traditional automation capabilities.

The UiPath difference: Integration is everything

This is where UiPath enters the picture. Currently, the only way to use LAMs reliably in production, at scale, and unattended, is through integration with an automation platform such as ours.

Here's why our approach works:

Access to leading models in one place: UiPath provides access to Anthropic's ComputerUse, OpenAI's Operator, as well as our own versions right out of the box. No need to manage multiple licenses or integrations. As new models emerge, we'll continue to incorporate them, ensuring you always have access to the best tools for each job. And since the prices and capabilities between them vary wildly—like orders of-magnitude differences—this will come in very handy.
Deep platform integration: We've embedded these models into our automation platform, allowing you to:
- String together multiple micro-tasks to create complex, reliable processes
- Combine LAM's capabilities with our full suite of RPA tools
- Debug and steer the models for maximum performance
Enterprise-grade capabilities: Schedule, monitor, audit, and scale your automations with the same robust tools you already trust.

While we believe this integrated approach is currently the only practical way to deploy LAMs in enterprise environments, we expect it will soon become the industry standard. The future of automation isn't about LAMs replacing RPA. It's about intelligently combining these technologies to get the best of both worlds. Just like API automation has been coexisting with UI automation.

In action

The power of this approach becomes clear in practical examples:

Imagine an automation that needs to interact with a web application where elements dynamically change position based on content. Traditional RPA might struggle with reliable selectors, requiring complex workarounds and constant maintenance.

With UI Agents, you can simply instruct: "Click the 'Download Report' button next to the most recent entry." The agent understands the semantic relationship and completes the task regardless of where the button appears on screen.

We've even demonstrated the flexibility of UI Agents by having them identify and click "Play" buttons across similar but entirely different video game interfaces without being specifically programmed for each one. This was impossible with traditional automation approaches.

Neither image—not selector-based automation would work, since the buttons look different, are in different languages, and selectors are not available.

The ability to understand context and adapt to different visual layouts showcases exactly why this technology is so promising when applied appropriately.

The future is coming, but we are not waiting

Will LAMs eventually become powerful enough to handle entire complex processes without assistance? Almost certainly. But that future isn't here yet.

In the meantime, UiPath approach lets you start capturing the benefits of this revolutionary technology today, without sacrificing the reliability and scalability your business requires.

UI Agents aren't replacing traditional RPA. They're enhancing it, creating a whole that's greater than the sum of its parts.

Join in

UiPath UI Agent* is now available in public preview for all enterprise license users. If you're interested in exploring how this technology can transform your automation strategy, please explore the detailed user guide on UiPath documentation portal.

If you have any questions or run into any issues, please reach out to our team on UiPath Forum.

To participate in this public preview, you’ll need an existing license with one of the LLM vendors supported by UiPath (see our documentation for details.)

*UI Agent is a code name used during the preview stage. The final commercial name will change before general availability.

Topics:

Agentic

Cosmin Voicu

Principal Product Manager, UiPath