How To Use Sharding For Big Data in UiPath Process Mining

Solutions
- Solutions
  - Agentic use cases
    Watch agentic automation use case videos and demos
    Webinars
    Learn best practices from industry experts.
    Customer stories
    Gather insight, read success stories, and more.
- By industry
  Banking & financial services
  Healthcare
  Insurance
  Public sector
  Manufacturing
  All industries
  By department
  Supply chain
  Finance & accounting
  HR
  QA / Testing
  Contact center
  All departments
  By technology
  Peak.ai
  Coded agents
  Microsoft
  SAP
  Agentic testing
  Technology solutions overview
  Prebuilt Solutions for the agentic enterprise
  Agentic workflows that connect AI agents, robots, and teams across your business.
  Explore prebuilt solutions
Platform
- - Agentic Automation
    Discover the place where agents think, robots do, and people lead.
  - Agentic Testing
    Explore agentic testing for the enterprise
  - Explore the Platform
  - View all products
  - Pricing
  - Support
- Agentic Automation
  Agentic Automation
  Discover the place where agents think, robots do, and people lead.
  Processes
  Model and orchestrate agents, robots, and people end-to-end
  Agentic orchestration
  Business process management (BPM)
  Process intelligence
  Workflows
  Plan, build, and deploy automated workflows
  Agentic & robotic workflows
  Human-in-the-loop
  Agent evaluation
  Activities
  Empower agents and robots with AI, API, and rules-based tools
  Agentic activities
  RPA & API
  Forms & apps
  Intelligent Document Processing (IDP)
  Foundation: Orchestrate with security, governance, and trust
  Accelerating ROI from agentic AI: biggest product announcements from UiPath FUSION 2025
  Read now
  Agentic Testing
  Agentic Testing
  Explore agentic testing for the enterprise
  Topics
  Unlock comprehensive testing for enterprises
  Enterprise applications
  Integrations
  SAP testing
  Test automation
  Products
  Take your testing to the next level with agentic testing
  Test Cloud
  Agent Builder for testers
  Autopilot™ for testers
  Explore agentic testing solutions for your enterprise
  Agentic testing is here. Catch all the buzz from our recent launch at the Agentic AI Summit.
  Watch the launch
Why agentic
- Why agentic
  - Customer stories
    Gather insight, read success stories, and more.
    Blog
    Get up close and personal with our people and products.
- Get started
  Agentic AI
  Agentic automation
  Agentic testing
  AI agents
  AI automation
  AI Orchestration
  What is RPA
  See all topics
  Deep dive
  Events and webinars
  Customer stories
  Demos and videos
  White papers
  Analyst reports
  Blog
  See all resources
  Our partners
  UiPath Partner Network
  Find a partner
  Become a partner
  Business partner portal
  Technology partner portal
  Professional services
  See all partners
  Don't miss the best bits of FUSION
  Catch our keynote replays and access a curated session playlist.
  Register to watch
Developers
- Developers
  - Developer home
    Start here to explore all the ways you can build and deploy agents.
    AgentPath
    Discover the developer's path to agentic automation.
    Academy
    Learn the skills of the future with free online automation training.
    Documentation
    Explore product documentation and guides.
- Learn
  Academy
  Academic Alliance
  AgentPath
  Certifications
  Digital credentials
  UiPath DevCon
  UiPath.ai
  Support
  Community
  Customer portal
  Customer support
  Documentation
  Forum
  Marketplace
  Latest
  Tech blog
  AI research
  Community blog
  Discover UiPath Labs
  Explore our latest experiments, preview our research, and give your feedback to influence the future of automation.
  Try now

All

uipath.com

Forum

Docs

Close

Try UiPath Free

UiPath Community blog

Tutorials

Community news

Developer Interviews

Community events

Academy

Forum

Community Blogs

Tutorials

Sharding Big Data in UiPath Process Mining

Martijn Wijffelaars

•March 11, 2021

Share at:

Sharding Big Data in UiPath Process Mining

How to handle large data volumes in UiPath Process Mining? Nowadays, all data analytics activities face the same challenge–handling big data. Several trends in the last decades have only made this problem worse.

On one hand, the amount of data gathered is immense. In the last few years alone, we have created more than 90% of the data in all our history. It is mind -boggling to imagine how much data this actually is!

The way we handle big data has changed as well. A decade ago, a data analyst could spend hours configuring a data-mining algorithm or writing queries to a reporting system. The data analyst would press a button to execute the query and would wait minutes—sometimes hours–for the answer to his question.

Currently, that principle is not true anymore. Like many other data analytics techniques, process mining is making a move towards a more business-oriented audience. This transition really changes the game.

Business users want an easy-to-use tool which gives them relevant insights, fast. They expect a user experience similar to what they know from their smartphones. So, instead of waiting minutes, they want results in seconds.

How does data affect speed?

UiPath enables you to make ‘governed self-service’ process mining applications. What exactly does this mean?

UiPath Process Mining gives business users a contained information space that is very easy to use. Users get the insights they need to optimize their business processes. However, the speed of such an application must be fast enough to keep these users engaged.

Our software developers at UiPath love performance. They are continually making step-by-step improvements to the overall speed of the UiPath Process Mining product. But does it only depend on their efforts and extra hours spent in the office?

Indeed, there are many other factors that determine the speed of process mining. The following is a very simple rule of thumb that applies to all data analytics tools: “The more data you put in an application, the slower it gets.”

Performance scales in the number of records used in the app. It’s the number of records in your largest dataset that has the biggest impact on performance. In process mining, that usually is the event log itself. Remember, it’s a very simple trade-off. The more data you put in, the slower it gets.

How to improve performance?

One solution is to reduce the amount of data records that are loaded. For example, you could limit the time-period from ten years to only one year. However, that’s not always desirable.

And what if you want to load a drastically higher number of data records: say, ten times, 100 times, or maybe 1,000 times as much?

Sounds impossible? The UiPath innovative solution to this problem is called “sharding.”

What is sharding?

With sharding, you divide the original dataset into multiple shards. The smaller each shard is, the faster each shard will be. When a user logs in, the corresponding data shard will be loaded.

A typical unit for sharding would be “company code” or “department.” For example, if you have 50 company codes, each shard will contain one company code, and essentially be 50 times faster than the original dataset.

User management can be isolated per shard, such that users can be managed separately. Using the Process Mining User Sync functionality, information about who belongs to which shard can be loaded automatically without extra configuration for each new user.

Development is easy because you only have to develop, maintain, and deploy one single application. It can be used for all shards, because the data structure of each shard is the same.

Now, you might be wondering: what if I want to compare all my company codes? Is that still possible with sharding?

Benchmark shards

While sharding vastly improves performance per shard, you lose the ability to compare over shards. To get that overview back, Process Mining has “benchmark shards” that combine the data of multiple shards into one benchmark.

To make sure the benchmark shard performs better than the original dataset, we must somehow reduce the data per shard. There are multiple ways to do that.

1. Pre-aggregation

We can pre-aggregate values over shards, or any other attribute in the dataset. This prevents you from doing all detailed analyses, but you are still able to compare differences over shards.

2. Lower level of detail

With Process Mining, a typical benchmark shard removes levels of details in the events. We can filter out all fine-grained events, and only keep the high-level events. This enables you to compare processes on a coarse level.

3. Tagging

The unique ability to tag interesting situations in Process Mining works like a charm in combination with benchmark shards. You can even remove all event data and keep the tags of their respective cases. This makes it easy to compare tags over multiple shards.

Combining shards

The combination of a benchmark shard, and many normal shards gives you the best of both worlds. A high-level overview to compare shards, and a possibility to zoom in to a specific shard, to see all fine-grained details available.

UiPath Process Mining gives your business users a great user experience, by switching seamlessly from benchmark shard to a specific shard and back. High-level management can see the overall picture, while you can still zoom in to all the details. And the cherry on the cake–all of this can be done at great speed!

Learn more about UiPath Process Mining on our Academy!

Topics:

Process Mining

Martijn Wijffelaars

Product Management Director, UiPath