Tips for Preparing Your Dataset for Process Mining

Tips for Preparing Your Dataset for Process Mining


Preparing your dataset and connecting it to process mining are the most important two steps of a process mining project. If done incorrectly, there is a risk of obtaining unreliable insights, derailing the investment and hard work put into your project.

Thankfully, there are many best practices and features within UiPath Process Mining that are designed to make preparing and connecting your dataset simple, accurate, and efficient. And of course, we have a variety of resources to help guide you, like our Academy course and, Forum, along with our experts as you kick-off your process mining project.

Identify your attributes

As you might already know, three different attributes are necessary for process mining:

  • Case ID: a unique identifier necessary to track a case throughout a process, e.g., quality notification or purchase order number.

  • Activities: these are the various states that a case can go through, e.g., outgoing payment created or goods receipt.

  • Event: an instance of an activity with a time stamp.

When preparing the data, we recommend involving a person with solid knowledge of the specific process (i.e., a domain expert). This person provides valuable input about the attributes that could serve as case IDs, activities, or other relevant attributes. 

Set goals with your process in mind

Countless systems record every step in a process and within enterprise resource planning (ERP) systems, there are many different processes. Think of purchase-to-pay or order-to-cash. Every step is recorded in a database, which can then be mined. Nonetheless, you still have to think beforehand about what the goal is specific to the process. For instance, by mining the purchase-to-pay process, one of the goals may be to reduce purchase order cycle times.

Depending on the processes and goals, you can then decide which attribute to use as case ID. Often, processes in ERP systems can be looked at from various perspectives. If the perspective is clear, you likely have an idea of how the 'case' traverses throughout the process and which activities it traverses through. With that knowledge, you know which tables are needed to obtain complete information. 

Connecting to UiPath Process Mining

Most of the time, half the work is knowing the structure of the database and its tables. Fortunately, databases can be directly connected to the UiPath Process Mining via Microsoft Open Database Connectivity (ODBC). Having an IT specialist present during this phase is highly recommended.

A helpful advantage of UiPath Process Mining is that extract, transform, load (ETL) is built into the tool. This allows you to directly query databases that allow ODBC, making it a simple way to get data into Process Mining and resulting in greater flexibility in data selection, as well as saving on costs associated with using an intermediate tool. 

Furthermore, ODBC prevents a lot of manual labor and allows you to refresh the data via scripts. That saves you from having to export data from the database to files and then back into Process Mining. You only need to set up the connection once while the manual process is error-prone. Also, getting the data from the database directly means that you don't have to worry about the parse settings.

Apart from that, it's worthwhile considering what information needs to be exported. It may be unnecessary to export all columns for the insights you need. Less columns will result in a lower disk space requirement for the server. Also, it's good practice to avoid open text fields, keeping the potential for human error at a minimum. 

Verify your data from ODBC

Once the ODBC connection is set up, we advise you verify the data that was loaded. In other words, verify whether the correct databases were queried and the necessary fields were retrieved.

Creating a process graph is the easiest way to check the data, as it requires all three attributes and visualizes the process. 

The process graph should look somewhat similar to the process being analyzed. Unexpected events or attributes could be further investigated. For example, when an attribute only has NULL values.

UiPath Process Mining includes a scan for application issues. This scan will also check your attributes for error values. Most of the time, error values indicate that something is wrong with the connection. 

How do you check whether data is optimized?

With ERP systems, we often see millions of records logged from several years, but users may not need all this information for the insights they're after. To improve performance, it's wise to consider what information is really necessary. If only current year data is needed, it's possible to filter out data in the query assuming ODBC is used.

If you can't or don't want to limit the amount of data resulting in a slower application, sharding may be a good solution. A good split could be a different application for each process within the ERP system.

Another tip to improve performance is looking at the attributes needed. Often, the process owner will say everything is important while they are only interested in a few attributes. Think logically about what is really important and check if the table fields contain data. 


In conclusion, here are some key considerations when preparing your dataset for UiPath Process Mining:

  • Always try to make an ODBC connection instead of using files.

  • Try to have both an IT specialist and a domain or process expert present when verifying the data.

  • Start with the three essential attributes to verify the data and add attributes to enrich the data afterward.

  • Use common sense: does what you see, match expectations, and do you need it all?

Our expert consultants at UiPath support implementing Process Mining from the start, making sure your organization hits its process mining goals.

Happy mining!  


Special thanks to Thijs Ledeboer, Consultant EPS Solution for their assistance with this blog post.

Avatar Placeholder Big
Rik Schreurs

Software Engineer II, UiPath