Let’s get real—AI agents aren’t easy to build or deploy. But once they’re embedded, the impact is incredible. I love hearing from UiPath customers like Ainara Etxeandia Sagasti, Head of the Digital Services at Lantik, who’s "combining RPA, generative AI, and agentic technology [to make] public services more accessible, efficient, and citizen-focused than ever." Already, more than 10,000 AI agents have been built on the UiPath Platform™. Agents can transform process efficiency and profitability, but they need strong orchestration and help from automation and humans in the loop.
In this blog post, I’ll cover the most common pain points when building, testing, or deploying AI agents at scale. I’ll also explain how an orchestrated approach—built on controlled agency and interoperability—can mitigate them.
Developers and users frequently cite the unreliability of AI agents as a barrier to production. Large language models (LLMs) make agents flexible and adaptable, but this also leads to inconsistent outputs. This can frustrate development and testing. As one engineer put it, “My agents sometimes work perfectly, then completely fail on similar inputs. We need better ways to simulate edge cases and reproduce failures consistently… monitoring agent ‘drift’ over time is a real headache.”
Another challenge is hallucinations—agents making up facts or tool inputs—which can grind processes to a halt. A user building AI workflows shared: “The biggest pain points we find are repeatability and hallucinations… ensuring that for the same or similar queries the LLM agents don’t go off the rails and hallucinate inputs to other tools.” This unpredictability needs extensive testing and validation, but agent testing tools are immature. When errors occur, they can be hard to diagnose due to opaque model reasoning. This causes teams to be extremely cautious about changes: “We’re so wary of system prompt changes at this point because we’ve been burned by telling the agent not to do something and then it starts behaving weird… so many times.”
The performance of underlying AI models is another problem. Large models can be resource-intensive or slow, while smaller models might not perform as well. Finding the right balance is challenging.
A lack of consistent, reliable outputs makes it difficult to trust AI agents with mission-critical or customer-facing tasks without extensive safeguards. In practice, achieving high reliability often requires simplifying agent behaviors, introducing strict constraints, or having fallbacks (like constant human intervention). Yet, these measures tend to compromise agent autonomy, efficiency, and therefore utility in value-adding enterprise scenarios.
While AI agents can automate complex tasks, developers find that human oversight and collaboration are essential—and striking the right balance is hard. Fully hands-off autonomy is often impractical because agents can make mistakes or unclear decisions. Enterprises need control over the degree of agency, which can increase over time as agents get more accurate and reliable.
A common approach is to keep a "human-in-the-loop” for certain approvals or to handle edge cases, but this can slow processes if not well-orchestrated. Then, a “human-in-the-loop” is called for certain approvals, critical decisions, and to handle exceptions. One AI engineer noted that constraining agents and involving humans leads to better outcomes: “Tightly constrained LLMs with human oversight can achieve good results for medium-complex tasks… [Fully] autonomous, general-purpose agents [at scale]” aren’t yet realistic.
On the flip side, if the AI is too tightly controlled or requires constant checking, it doesn’t create ROI. Sometimes, an agent can interrupt workflows or create more effort than it saves. For example, one developer explained how coding Copilot disrupted productivity by forcing manual corrections: “It begins something but fails to finish it… I have to divert my attention to checking and closing the tags, parentheses, etc. It disrupts my flow, slowing me down.”
The challenge is designing hybrid workflows where agents handle the work but seamlessly hands off to humans for judgment calls—without creating extra friction.
The ROI of AI agents is a recurring concern, especially as usage scales. Large language model APIs (and the infrastructure to run them) can be expensive. Teams worry about cost blowouts if agents are not optimized. One user claimed that current agents are “too expensive” for what they achieve. ROI can be hard to measure when reliability is low. If an agent only succeeds part of the time, the cost of its failures (and manual fixes) can outweigh the benefits.
Enterprises are trying to control costs through methods like model optimizations and usage policies. One user described implementing caching to reduce repeated calls and carefully sourcing high-quality data to improve output efficiency. Others focus on choosing the right model for the job: “I would [love] a framework where I can have my prompt… run it across all different models, [and] find the best and cheapest. Right now my AI agent uses over 200 prompt templates, and testing and retesting them all is expensive.” Ultimately, prompt engineering and model experimentation incur real costs.
Vendor pricing models (per token, per call, etc.) also play a role. For example, using GPT-4 for everything might be overkill, but using a cheaper model could reduce quality. Teams must strike a balance to justify ROI. Furthermore, management might question the business value of agent projects if they require significant ongoing spend on cloud AI services or specialized infrastructure. Without clear wins (either in revenue gain or cost savings from automation), investment can be hard to defend. Thus, optimizing cost and demonstrating ROI are front of mind—teams want to “get the cheapest bang for my buck” with AI agents by mixing and matching models while focusing on high-value use cases.
Organizations must enforce security, compliance, and ethical guidelines on AI agents, but this is easier said than done. Data privacy is a top concern—many companies ban or restrict cloud AI services until they’re confident sensitive data won’t leak. One developer shared that their workplace forbids tools like ChatGPT because of intellectual property risks: “No. It is deemed too much of an IP risk, [fearing] it might leak our secrets or violate someone else’s copyright.” When using third-party AI APIs, practitioners worry about customer data inadvertently being sent to those services.
Security is another issue: autonomous agents pose a risk if not properly sandboxed. There are reports of teams adding extra safeguards on top of agent platforms—for example, “we had to add [a] security layer on top… [and] use caching (Redis) for cost optimization” when deploying a lead generation agent. Out-of-the-box solutions often lack enterprise-grade security controls or cost management, and companies must bolt on their own governance. Additionally, ensuring agents comply with regulations (GDPR, HIPAA, etc.) and follow organizational policies is difficult if agent frameworks don’t provide hooks for oversight.
These concerns make stakeholders cautious: they want AI agents to be powerful, but transparent and controlled “with neutral, universally accepted protocols rather than proprietary systems” that hide how data is used. In short, without robust governance features (audit logs, permission controls, human override, etc.), many organizations hit a wall in wider agent deployment.
Moving an AI agent from proof-of-concept to production can introduce a host of issues. Users report that what works in a controlled demo often struggles with real-world scale, volume, and complexity. Common concerns include latency and throughput (LLM-powered agents can be too slow for high-traffic or real-time applications) and the operational overhead of running the system reliably. As Adrian Krebs, Co-Founder & CEO of Kadoa, put it, “It doesn’t matter if you’re using [an] orchestration framework if the underlying issue is that AI agents [are] too slow, too expensive, and too unreliable.” Teams often need to rearchitect for efficiency—using caching, swapping models, or simplifying agent logic—just to meet performance requirements.
There’s also the challenge of deploying across environments (cloud, on-premises, edge devices) while maintaining consistency. In enterprise settings, not all departments will want to use the same tools, which makes standardized deployment harder. Operational scaling issues like monitoring, logging, and updating agents in the field are likewise underdeveloped. One Reddit user noted that even basic debugging can be “a nightmare… error logs are often cryptic, with no clear troubleshooting guide.” This only gets tougher when many agents are deployed. All this can slow agent adoption. Even major vendors have admitted that customers are “just getting started” and meaningful at-scale results are still emerging.
Building systems where multiple AI agents collaborate is tricky. Developers struggle with coordinating agent roles, managing shared state, and preventing agents from getting stuck in loops or conflicting with each other. Even with orchestration frameworks, a misstep in one agent’s output can derail an entire workflow. As one developer claimed, “People are just experimenting. The unreliability is still a major issue: any derailing in the auto-regressive generation process can be fatal for an agent.” Others stress the difficulty of creating self-healing or resilient workflows—for example, adding logic to retry failed steps or human intervention.
These orchestration challenges mean teams often end up fixing one issue only for others to appear: “Sometimes it even feels like whack-a-mole. Fix one issue with some prompt engineering and then create three more.”
No single AI agent is dominant in the market. Organizations might use OpenAI one day, switch to an open-source model the next, and integrate various third-party tools. But compatibility and smooth integration is a major challenge. Tool and model integration often requires custom adapters or glue code. For example, connecting an agent to a proprietary database or an internal API can involve significant effort if the framework wasn’t designed with that in mind. Developers argue that many frameworks are “heavy” and come with assumptions that don’t fit all use cases: “Unfortunately many of these frameworks are pretty heavy if you just need basics.”
Conversely, going “framework-agnostic” often means writing a lot of boilerplate from scratch. Users want to avoid reinventing the wheel without getting locked in. One developer described settling on a more flexible library specifically to maximize compatibility: “I tried a lot… Eventually I settled for using [Instructor], because I could quickly switch between LLMs – both local/OS and proprietary – and I could have the same structured input/output everywhere.” This highlights the need for agents that allow easy swapping of AI models or services to meet evolving needs.
Another common need is integrating agents with existing software stacks and workflows. A lack of standard interfaces means each new agent might require a new integration effort. As noted, missing examples and advanced setup can hinder this. Furthermore, compatibility issues arise when an update to one component (e.g. an LLM API change) breaks the agent’s logic—something teams have to actively manage. In short, practitioners want plug-and-play interoperability: AI agents that connect with various models, data sources, and systems without extensive custom engineering.
AI models and frameworks are changing fast. Many teams want best-of-breed and worry that choosing a single vendor’s AI agent solution could make them inflexible down the line. There’s an explosion of agent frameworks, each with its own APIs and considerations. One developer compared it to the JavaScript framework craze: “In a few months we’ll probably have our version of ‘TODO app in 100 different JS web frameworks’…Even just understanding them all is a huge task.”
Committing to one ecosystem can mean limited flexibility. Certain libraries favor specific providers. For instance, a frustrated user warned their choice of framework “mainly breaks,” highlighting how some tools implicitly lock you into particular models or services. The risk is building around a vendor’s vision and later finding yourself “locked in – dependent on its updates, pricing, and policies, with no viable alternative.” Interoperability is also a concern for integrating agents into existing software stacks. Developers often find “no clear examples” to hook agents into languages and cloud services they already use, making it harder to adopt these tools across diverse teams.
Many of these challenges point to a need for agentic orchestration solutions that are flexible, interoperable, and human-centric. Agentic orchestration effectively manages, assigns tasks, and responsibilities among people, robots, and AI agents depending on their capabilities, ensuring operations are smooth, efficient, and aligned with the business’s strategic outcomes.
An orchestration layer that effectively integrates reliable AI agents, deterministic automation, and human inputs offers several advantages:
Improved reliability via deterministic backstops: the ideal end state of the orchestrated approach is controlled agency, which allows for optimal efficiency without excessive manual intervention. This depends on the orchestration of specialized AI agents that are carefully constrained in processes by enterprise-grade tools, deterministic robots, and a comfortable level of human review. By combining AI agents with deterministic automation scripts or rules, an orchestration layer ensures there’s always a fallback path. For example, if an AI agent’s output doesn’t meet a certain accuracy threshold, a predefined rule might handle that case (or at least flag it). This hybrid approach leverages the creativity of AI but within guardrails. Over time, an orchestration platform could even learn which agent is most reliable for which task (through monitoring outcomes) and route tasks accordingly, thereby optimizing success rates. The net effect is higher overall reliability of the workflow compared to a single black-box agent. As one developer noted, “tightly constrained [agents] with human oversight” and well-defined processes can yield reliably good results—exactly what orchestration enables.
Human-in-the-loop integration: a key benefit of a well-designed orchestration layer is the ease of weaving in human checkpoints. For instance, the orchestration layer can be configured to pause and request human approval if an agent’s confidence is low or its decision has high stakes. This provides the 'safety net' needed to deploy agents in critical workflows. Instead of hard-coding human oversight separately for each agent, the common platform can offer a consistent interface for escalation to humans, and even learning from human corrections over time. This alignment of AI + human workflows helps leverage AI speed where appropriate, but always with a human backstop to ensure reliability and trust.
Centralized governance and security: a vendor-agnostic orchestration layer can enforce security and compliance uniformly across all AI activities. It can serve as a gateway that monitors what data is sent to each agent, scrub or anonymize sensitive information, and log all agent decisions for audit purposes. This addresses the governance concerns by giving organizations a single point of control: for example, administrators could configure which AI models are allowed to handle certain data, or require that certain queries always be handled by an on-premises model due to privacy policies. Such a system could also integrate with identity and access management (IAM) for role-based control over agent actions. Policies (like rate limits or cost budgets) could be applied globally. All these capabilities mean enterprises can adopt AI agents more confidently, knowing there's an oversight layer to prevent unwanted data leaks or rogue behavior.
Interoperability and avoiding lock-in: A neutral orchestration layer lets teams plug in different AI models or services as needed, without being tied to one vendor’s ecosystem. This mitigates the fear of having to rebuild everything if you switch providers. As one engineer advocated, the goal is for end users “not have to worry about vendor lock-in… ensuring AI systems work seamlessly across platforms rather than trapping users within a single vendor’s ecosystem.” By speaking a “common language” to multiple AI backends (OpenAI, Anthropic, open-source models, etc.), an orchestration layer ensures you can always choose the best tool for the job and pivot when technology or pricing changes.
Multi-agent coordination and specialization: An orchestration platform can manage a team of agents, each specialized for a subtask, and coordinate their interactions deterministically. This reduces complexity for the practitioner—the orchestration layer can handle task routing, state management, and error recovery across agents. Instead of one monolithic agent attempting everything (and often failing unpredictably), you can have simpler agents focused on specific roles, with the orchestration layer linking them. Such a setup can also include rule-based automation or traditional software components for tasks that don’t require AI, ensuring that AI is only used where it adds value. The result is a more robust system where if one component fails or produces uncertain output, the orchestration platform can catch it (e.g. via validations, retries, or fallbacks).
Cost optimization and resource flexibility: A vendor-agnostic platform can dynamically choose between different models or routes to minimize cost while meeting performance needs. For example, it might use a cheaper local model for simple queries and call a more expensive API only for complex cases—transparently to the end user. It can also batch requests, cache results, or adjust the frequency of agent runs. One team reported doing this manually (adding caching and using a cheaper model for certain tasks); an intelligent orchestration platform could handle such optimizations automatically. Additionally, by not being tied to one vendor, organizations can take advantage of pricing competition—switching to a more cost-effective service if one raises prices. This flexibility improves ROI, since the orchestration ensures you’re using resources in the most efficient way (the “best and cheapest” model for each job, as users desire).
In summary, a proven and trustworthy agentic orchestration layer directly addresses many pain points:
It delivers controlled agency by improving agent reliability without compromising autonomy or utility
Blends AI with humans and robots for better outcomes
Abstracts multi-agent complexity into a manageable framework
Avoids single-vendor limitations
Provides the governance needed for enterprise-scale deployment
By learning from real-world struggles—repeatability issues, integration headaches, security fears, cost overruns—such a solution can empower practitioners to harness AI agents with far less friction and risk. The result is an AI agent ecosystem that is more reliable, adaptable, and aligned with business needs, allowing teams to focus on solving problems rather than fighting the infrastructure.
Chief Product Officer, UiPath
Sign up today and we'll email you the newest articles every week.
Thank you for subscribing! Each week, we'll send the best automation blog posts straight to your inbox.