From scripts to speech: How ScreenPlay is redefining UI automation

Share at:

Man writing on laptop

UiPath ScreenPlay

For years, UI automation has relied on a familiar formula: selectors, step-by-step actions, and carefully designed logic to get robots to click, type, and navigate like people. It has worked remarkably well, powering millions of automations across every industry. But as applications evolve faster than ever—new releases every week, dynamic layouts, responsive designs—these traditional approaches are being tested.

With the general availability of UiPath ScreenPlay, a new chapter begins.

ScreenPlay introduces something fundamentally different: a natural-language-driven computer-using agent that understands what you want to accomplish and autonomously figures out how to do it on-screen. Instead of describing every click and selector, you describe your goal—and the agent handles the rest.

This new approach doesn’t replace traditional RPA. It enhances it, filling in the gaps where selectors break, where screens change frequently, or where UI complexity makes traditional automation costly to build and maintain. And it opens the door to automating scenarios that were previously considered too fragile or too variable to automate at all.

Why ScreenPlay matters?

Enterprises live in a world of hybrid application ecosystems: legacy systems, mainframes, modern SaaS, custom in-house tools, and constantly refreshed web portals. As these applications evolve, maintenance becomes one of the biggest hidden costs of automation programs.

ScreenPlay directly addresses this challenge with three major capabilities:

1. Natural-language UI automation

Developers describe the task—“open the filter panel and download the invoice from September 2024”—and ScreenPlay builds and executes the multi-step plan. No selectors. No rehearsed click paths.

2. Adaptive screen understanding

Even if the UI changes, the agent reads the current screen, interprets its structure, and adapts its actions in real time.

3. Cross-platform consistency

Whether users work on Windows, macOS, or Linux, ScreenPlay behaves consistently across all of them. This combination gives enterprises a powerful new tool for the most difficult parts of UI automation—dynamic lists, virtualized tables, unpredictable UI elements, and applications that change frequently.

ScreenPlay: A closer look

ScreenPlay brings together several UiPath technologies and the best industry LAMs to create a reliable, fully autonomous UI agent.

Perception: Seeing the application

ScreenPlay captures screenshots, DOM elements, and accessibility information. It doesn’t rely solely on selectors—it uses a blended approach with UiPath’s screen understanding models built from decades of UIAutomation innovation, including DOM extraction and UI grounding models that “understand” the screen layout.

Reasoning: Understanding your intent

Large Action Models (LAMs) power the reasoning layer. ScreenPlay supports:

  • UiPath GPT-5 + DOM

  • GPT-4.1 + DOM

  • GPT-5 Mini + DOM

  • Gemini 2.5 Flash + DOM

  • Anthropic Computer Use

  • Operator

  • BYOM through Azure, OpenAI, OpenAI, Amazon Bedrock, Google Vertex, and any provider using an OpenAI-compatible API

ScreenPlay turns your natural-language prompt into a structured, multi-step plan—complete with clicks, scrolls, inputs, and recovery behavior.

Execution: Interacting with the UI

The agent carries out actions one by one, constantly checking the screen and adjusting as necessary. If something unexpected appears—a popup, a modal, a shifted element—ScreenPlay adapts.

As it executes each step, the agent continuously re-evaluates the UI state and adjusts its actions accordingly, rather than following a fixed script.

All of it runs within the framework enterprises already trust: UiPath Studio and Studio Web for design, Orchestrator for deployment and control, and the AI Trust Layer for governance and security.

Installing and enabling

One of the strengths of ScreenPlay is how quickly you can adopt it inside existing UiPath projects.

Studio prerequisites

You do not need to install a separate “ScreenPlay application.” Instead, you:

  • Use UiPath Studio 2025.10 or later for the best experience (modern prompt editor, inline variables, inline images).

  • Install UiPath.UIAutomation.Activities 2025.10.20 or newer from the Official feed.

  • Connect Studio to your Automation Cloud tenant, where ScreenPlay is licensed.

ScreenPlay works across Windows, macOS, and Linux desktops and can run both attended and unattended once published to Orchestrator.

Enabling the ScreenPlay add-on

ScreenPlay is delivered as an add-on to UiPath’s UIAutomation capabilities:

  • Advanced and enterprise platform tiers include the ScreenPlay add-on with 50,000 ScreenPlay runs per year when using Standard models.

  • If you consume that bundle using Basic models, you effectively get five times more runs for the same entitlement.

Once the add-on is active, eligible user licenses (for example, cloud basic, Automation Developer in Flex; Basic, Plus, Pro, App Test Developer in Uni) can design with ScreenPlay in Studio and Studio Web.

For Community organizations, ScreenPlay is available immediately with a monthly entitlement (for example, 500 runs/month with Standard models and more if using Basic models).

Self-serve trial

For enterprise customers that want to experiment before committing, a self-service ScreenPlay trial is now available:

  • Activated directly from Automation Cloud → Licenses → Free Trials

  • Includes a fixed number of runs (5,000) valid for 60 days

  • Does not require sales involvement or manual approval

How to use ScreenPlay in Studio and Studio Web

Once ScreenPlay is licensed and the UIAutomation package is updated, the workflow to use it is straightforward.

Step 1: Add ScreenPlay to a workflow

In Studio: 1. Open or create a process. 2. Add a Use Application/Browser activity to define the application context (a desktop app or a browser window). 3. Inside its Do block, search for ScreenPlay and drop the activity.

In Studio Web:

1. Go to Studio → Create New → RPA workflow. 2. Add the ScreenPlay activity directly to your flow. 3. Configure it similarly to Studio Desktop.

Step 2: Describe the task

In the Task property of the ScreenPlay activity, describe the goal in natural language:

“Open the billing filter, select September 2024, and download the latest invoice as a PDF.”

If your process is multi-step, resist the temptation to put the entire end-to-end flow in one prompt. Instead, split it across multiple ScreenPlay activities that each handle one small subtask. This is where best practices start to matter.

Activity properties

To get consistent behavior, it helps to understand what the core properties of the ScreenPlay activity do.

Understanding ‌ScreenPlay activity properties

While exact labels can evolve over time, you will typically see properties in these categories:

Task

  • What it is: The main natural-language instruction that tells the agent what to do in the current application context.

  • How to use it: Describe the intent and any constraints (dates, values, filters, conditions). Be explicit about which elements the agent should interact with (by label, context, or description)

Model

  • What it is: The selected Large Action Model (LAM) that powers the agent’s reasoning

  • How to use it:

  1. Use standard models (e.g., GPT-5 + DOM, GPT-4.1 + DOM, Operator, Anthropic Computer Use) when tasks are complex, ambiguous, or involve nuanced decision making.

  2. Use basic models (e.g., GPT-5 Mini + DOM, Gemini 2.5, Flash + DOM) for straightforward, repetitive tasks where cost and speed matter more than deep reasoning.

Maximum number of steps / actions

What it is: A limit on how many iterative actions the agent can take in a single ScreenPlay run.

How to use it: Set this high enough to allow the agent to complete the task (for example, 5–15 steps for simple flows). If you see the agent “wandering”, reduce this limit and break the process into smaller prompts.

DOM usage / UI grounding options

What it is: A setting that controls how heavily the agent relies on DOM data (when available) vs purely visual cues.

How to use it:

  1. For web automation, keep DOM usage enabled for higher precision and better element targeting.

  2. For some legacy or non-standard UIs, ScreenPlay may rely more on visual understanding.

Timeout

What it is: The maximum time ScreenPlay is allowed to run for this step.

How to use it:

  1. Tune based on how long the application normally takes to load and respond.

  2. Use higher timeouts for slow systems but avoid excessively long values that hide performance issues.

Variable security

What it is: A safety control that decides whether variables passed into the Task are treated as literal data or as part of the instruction.

How to use it:

  1. ON (recommended in production): ScreenPlay treats variables like {{invoice_number}} as plain text only. Any embedded directive (e.g., “ignore previous instructions”) is neutralized.

  2. OFF (for debugging only): Variables can influence the agent’s reasoning. This is useful for diagnosing prompt behavior, but unsafe for live workloads.

Variable Security

You can also control trace file generation and retention at the project level:

Project Settings → UIAutomation Modern → ScreenPlay

This determines whether HTML execution traces are saved and for how long.

Licensing, runs, and BYOM in practice

Under the hood, ScreenPlay’s usage model is simple once you understand the key concepts.

What is a ScreenPlay “run”?

A run is the billing unit for ScreenPlay and includes up to five UI actions:

  • 1 run = 1–5 UI actions

  • 2 runs = 6–10 UI actions

  • 3 runs = 11–15 UI actions

Actions are things like click, type, select, or scroll interactions. If a ScreenPlay Task stays within 1–5 actions, you only consume one run.

Standard vs basic models and cost

ScreenPlay introduces two model tiers:

1. Standard models:

  • High capability (GPT-5 + DOM, GPT-4.1 + DOM, Operator, Anthropic CU)

  • Higher per-run cost (more Agent Units or platform units)

2. Basic models:

  • Lightweight, faster models (GPT-5 Mini + DOM, Gemini 2.5 Flash + DOM, GPT-4.1 Mini + DOM)

  • Lower per-run cost, typically five times more runs from the same bundle

The ScreenPlay add-on’s 50,000 runs are defined assuming standard models. If you only use basic models, that bundled capacity goes much further.

Bring your own model (BYOM)

If your organization already has its own LLM subscription (for example, Azure OpenAI):

  • You configure ScreenPlay to use your own model via AI Trust Layer → LLM configurations → UI Automation / ScreenPlay.

  • Once BYOM is activated, UiPath does not charge consumption units for ScreenPlay usage. You pay only your own LLM provider.

This makes ScreenPlay extremely attractive in environments where an enterprise AI contract already exists.

Bring Your Own Model (BYOM)

Once your ScreenPlay task runs, the next step is to validate how the agent interpreted your prompt and executed each action.

Reviewing the execution trace

The execution trace is a crucial part of ScreenPlay, offering transparency that traditional ‘black box’ AI cannot.

After each run, ScreenPlay can generate an HTML “Execution Trace” file that includes:

  • The original prompt and a unique trace ID

  • Overall duration and a breakdown of where time was spent (DOM scanning, reasoning, actions)

  • Token usage for the LAM call

  • A step-by-step replay of the agent’s actions with screenshots

  • Highlighted bounding boxes where clicks or keystrokes occurred

  • Human-readable “thinking” for each step (why it chose that action)

  • Any errors or fallbacks that occurred

You can open this file in any browser and walk through what the agent did and why. This is invaluable for:

  • Debugging prompt design

  • Demonstrating behavior to security and audit teams

  • Tuning timeouts, step limits, and model choice

  • Comparing behavior between models (e.g., Basic vs Standard)

Reviewing the Execution Trace

The execution trace is a crucial part of ScreenPlay

Security and trust by design

ScreenPlay is governed by UiPath’s AI Trust Layer and built with a security-first design.

When and how to use ScreenPlay

ScreenPlay is not intended to replace all UI Automation. It is designed to solve the high-friction, high-variability areas of automation.

Use cases

  • Screens that change often (SaaS portals with frequent UI refreshes)

  • Dynamic or virtualized elements (infinite scroll, complex tables)

  • High-branching logic with many conditional paths

  • Multi-application workflows where context shifts often

  • Situations where selector tuning has been historically painful

  • Cross-platform automation scenarios (Windows + macOS + Linux)

Two starting patterns that work well

1. Upgrading problematic automations

  • Identify steps that fail often due to selectors or layout changes.

  • Replace only those steps with ScreenPlay while keeping the rest of the workflow intact.

  • Use granular tasks so ScreenPlay handles the fragile parts, not the entire process.

2. Designing new automations with granularity

  • Start from scratch using ScreenPlay activities for each well-defined UI subtask.

  • Keep each ScreenPlay activity focused on one or two related actions.

  • Choose Basic or Standard models per step based on complexity and cost.

When not to use ScreenPlay

  • Highly stable UIs that are already robust with classic selectors

  • Flows that are fully covered by APIs or Integration Service

  • Ultra high-volume, low-complexity transactions where pure selector-based automation is cheaper and deterministic

The real power comes from combining ScreenPlay with existing UiPath capabilities—using classical RPA, API automation, and ScreenPlay together, where each is strongest.

Why ScreenPlay can transform enterprise automation

ScreenPlay represents a significant shift toward adaptive automation. Instead of building brittle step-by-step scripts, developers can now build workflows that understand user intent, interpret complex applications, and adjust to change.

This fundamentally transforms:

  • Automation delivery speed  Developers build faster using natural language and reusable ScreenPlay patterns.

  • Maintenance cost  Automations become far more resilient to UI changes, reducing the ongoing effort to “fix selectors”.

  • Automation reach  Workflows once considered impossible to automate—because of UI variability or complexity—can now be handled with confidence.

  • Scalability  With cross-platform consistency, rich observability through execution traces, and enterprise governance through the AI Trust Layer, ScreenPlay supports high-volume unattended workloads safely.

ScreenPlay marks the beginning of a new era for UI automation—one where natural language becomes the interface to automation and where resilient, intelligent agents expand what enterprises can automate.

Try ScreenPlay

ScreenPlay is more than a new feature—it is an evolution in how automations interact with applications. By weaving together natural-language reasoning, deep UI understanding, rigorous observability, and enterprise-grade governance, ScreenPlay lays the foundation for the next generation of computer-using agents.

Enterprises ready to modernize their automation estates can begin incorporating ScreenPlay today—starting with the fragile steps in existing workflows, then expanding into new, previously infeasible scenarios. As the platform matures, its role will only grow, shaping how RPA, AI agents, and enterprise software come together across the next decade of intelligent automation.

As enterprises move toward agentic automation, ScreenPlay becomes an essential building block—bridging classical RPA with modern, adaptive, natural-language-driven UI agents.

Logesh Shunmuga Velu
Logesh Shunmuga Velu

Technical Account Manager, UiPath

Get articles from automation experts in your inbox

Subscribe
Get articles from automation experts in your inbox

Sign up today and we'll email you the newest articles every week.

Thank you for subscribing!

Thank you for subscribing! Each week, we'll send the best automation blog posts straight to your inbox.

Ask AI about...Ask AI...