1 Introduction

Robotic process automation (RPA) is a powerful tool that can automate repetitive and time-consuming tasks and routines. However, like any software, RPA bots can be susceptible to errors. Running inefficient RPA bots can lead to time consuming governance or even rolling back to manual process (Kedziora and Penttinen, 2021; Modliński et al., 2022).

A minority of erratic RPA bots cause a majority of the time spent on the required repairs. RPA developers are burdened by requests from business and process owners to make changes in processes. They deploy new unstable RPA bots to production very often and upgrade them on the fly. This leads to the erratic behavior of RPA bots with many performance issues. One of the proposed solutions from RPA developers from Nordea Bank is the improved handling of erratic behavior in RPA bots (Kedziora and Penttinen, 2021). This assumption also supports Syed et al. (2020) in their research challenges for RPA bots. Any tool which would help them spend less time on governing RPA bots would be useful (Kedziora and Penttinen, 2021).

In process sciences as in process mining, business processes management or operational research, there exist many methods to investigate undesired behaviors (performance issues). We can build upon previous research in process sciences and investigate if any method is applicable in the RPA domain. Thanks to the fact that RPA is software but is used for the automation of processes, we can also use most of the techniques in the RPA field (Průcha, 2021). Process sciences differentiate multiple types of undesired behaviors in processes, namely as: errors, failures, deviations, variability, producing waste, inefficiencies and processing issues (Dumas et al., 2013; van der Aalst 2016). All of these undesired behaviors are also applicable to RPA processes, described in detail in in Table 1.

Because RPA is a software, it provides us with easy access to a wealth of information about the process. Any errors or failures encountered during the process are included in the success rate statistics of the RPA platform. However, it is not feasible to study the production of waste or inefficiencies without having any knowledge about the process or domain-specific information. The only practical advice in this regard is to use good design patterns for RPA bots to minimize waste production and mitigate inefficiencies (Prucha and Skrbek, 2022).

Processing issues related to the inability to place newly arrived items into queues are commonly associated with capacity problems, which fall under queueing theory. These undesirable behaviors can also be observed through the dashboard of RPA platforms, which provide information on the average duration of cases/items and the available capacity of RPA bots. To optimize capacity, it is crucial to improve the scheduling of RPA bots (Séguin and Benkalaï, 2020).

The deviations and variability of RPA bots are currently unknown since platforms do not track these metrics. However, from the perspective of process sciences, deviations and variability are generally considered undesired. In this research, we aim to investigate the influence of deviations and variability in RPA processes and explore their potential as tools for better understanding erratic behavior.

To achieve the objectives of this research, we have formulated the following research question that needs to be addressed:

RQ1

How do deviations and variability of RPA bots influence their behavior and performance?

The structure of this paper is organized as follows: Firstly, we have conducted an analysis of the related work. This is then followed by an introduction to our approach for addressing the research question. Next, we present the methodology employed in our research, followed by the results and a discussion to contextualize these results within existing research. Finally, we conclude our work.

2 Related work

As mentioned in the introduction, process sciences differentiate between multiple undesired behaviors in processes (Dumas et al., 2013; van der Aalst 2016). The negative behaviors of processes are also applicable to RPA processes. In Table 1, we take the most common negative behaviors in the processes and map them onto a field of RPA. Each negative behavior contains a description, a potential cause, and an example in RPA.

Table 1 Types of undesired behaviors in processes

As stated in the research objective of this paper, we wanted to investigate the variability and deviations of RPA due to the limited knowledge of variability in RPA, even though there are multiple undesired behaviors that can influence performance and behaviors of the RPA process.

In process science there is a known term for detecting process performance and behavior. It is called statistical process control (SPC). SPC is used for various applications and one of them is the detection of variability and deviation within the processes. Many publications in operations research have already addressed this issue. SPC has crossed the boundaries of the scientific community very well and the same principles are also widely used in industries such as Six Sigma. One of the focuses of process mining is also variability and variance of the processes. In this chapter we will try to find relevant publications about variability, deviations and statistical process control of RPA and their impacts on RPA.

2.1 Variability and SPC in process mining

One of the goals of process mining is to convert data into valuable insights with which organizations can make decisions (van der Aalst, 2022). The quality and preparation of data for process mining is very time-consuming and, therefore, process mining should provide significant added value and companies should use the insights to take action (De Weerdt and Wynn, 2022; van der Aalst 2016). Using process mining on RPA processes is currently not very common. However, it can be applied to automated processes for RPA (Egger et al., 2020; Průcha, 2021), where duration rates, bottlenecks and deviations can be identified. With advanced methods of process mining such as predictive process monitoring, enhancing the process data with data from other systems, root-cause analysis and online monitoring, it is possible to gain more valuable insights about the behavior of RPA processes including information on why an RPA bot fails, whether the bot fails when executing a particular variant and whether RPA bots can keep up with the queue of cases/items (Burattin, 2022; Di Francescomarino and Ghidini, 2022; Fahland, 2022; Van Houdt and Martin, 2021).

The use of statistical methods in process mining represents the core of process mining. The most commonly used methods include root-cause detection, predictive process mining and explainable AI in process mining (Garcia et al. 2019; Jans et al. 2013; Tiwari et al. 2008; van der Aalst 2016). Therefore, descriptive statistics and inferential statistics are very often used for process performance analysis to describe what is happening in the process effectively. Process variability is certainly one of the topics that resonates in process mining; for the query: “process variability in process mining”, we obtained 2,250 results in WOS. In process mining, process variability can be separated into two basic divisions:

  • Whole process variability - how many variations exist.

  • The variability of a single variant or process - for example, how many runs of the process are consistent over time (Ayora et al. 2013).

In our research, we focus on the variability of one variant or process over time. Results regarding whole process variability are not relevant to this study. The number of studies focusing on process variability in terms of duration is limited. There are 48 results in the WOS database for the query: variability in duration process mining and 129 results in the Scopus database for the query: variability in duration “process mining”. Based on the relevance of the results, the query varied by quotation marks for each database. From the results obtained, as well as additional investigations, we could not find research that addresses process behavior based on duration, variability and/or robotic process automation.

A significant number of results were related to process variability in healthcare, where processes are typically variable by nature (Munoz-Gama et al. 2022). There is also a study that focuses specifically on process duration and uses process duration to investigate process behavior and process performance visualizing the results using a graph they have designed for this purpose called the performance spectrum (Denisov et al. 2018). Furthermore, there is research on clocking process time from a transistor-liquid crystal display to reduce variability and error behavior in manufacturing (Kang et al. 2016). These studies show that statistical methods can be used to analyze process behavior and reduce or detect problematic behavior.

2.2 Statistical process control

Statistical Process Control has been used primarily in manufacturing where the limits and requirements are rigidly defined (MacGregor and Kourti 1995). In this environment, methods such as cumulative sum (CUMSUM), exponentially weighted mean (EWMA) and process capability work very well (Woodall 2000). Despite the ingenuity and potential these techniques have, they always need initial values as the mean, i.e., a lower bound limit and an upper bound limit, to be used. In pharmaceuticals, chemistry, construction and engineering, where SPC has spread, very often there are limits and standards that are given (Stoumbos et al. 2000). In fields that do not have strict rules and standards, the use of SPC convex methods is more complicated.

Since there are fields where limits and norms are not strictly specified, such as most business processes, it is necessary to use other SPC methods such as variance statistics. There are approaches for using variance statistics for SPC such as comparing a multivariable process using a covariance matrix (Tang and Barnett 1996a, 1996b). However, this approach reveals little about what is happening in the process and its behavior.

2.3 Measuring RPA performance

The most common key performance indicators (KPIs) for RPA bots have, from the start, been full time equivalent (FTE) hours saved and from these, the return on investment (ROI) of process automation has subsequently been calculated (Axmann et al. 2021; Wewerka and Reichert 2020). In the early days of implementing RPA in organizations, this was also one of the main indicators for selecting processes for automation (Anagnoste 2017; Willcocks et al. 2015). With the advances in RPA, other benefits of robotic process automation have become apparent, such as mitigating process errors, providing 24/7 service, recording process steps for legislation, and increasing employee satisfaction with automation of routine activities (Aguirre and Rodriguez 2017; Schuler and Gehring 2018; Syed et al. 2020). With these advances, companies have begun to measure new KPIs such as run-time (velocity), cost per error, license utilization, exception rate, and average automation uptime (Aguirre and Rodriguez 2017; Kokina and Blanchette 2019; Blueprint, 2021; Syed et al. 2020; Teodorczuk 2021; Wanner et al. 2019).

Views are also emerging that, in addition to the listed KPIs, firms should measure KPIs that are not common but provide additional value and decision support to organizations including break-fix cycle, break-fix person hours, break root causes, business value lost in downtime and consistency (Casey 2019; Blueprint, 2021).

3 Approach

In our research, we focused on answering the research question mentioned above. For answering the research question, we formulated two hypotheses based on the literature review.

For RQ1 we formulated the following hypotheses:

Hypothesis 1

Variability is related to the performance of an RPA bot (success rate).

Hypothesis 2

Deviations (Outliers) are related to the performance of an RPA bot (success rate).

For validation of the two hypotheses, we selected appropriate indicators of statistical dispersion (variability) which may faithfully describe the behavior of RPA bots based on the variability of the processing time of a single case (item).

We computed these values using real RPA processes. where we prepared processing times for each case processed by an RPA bot. The research process is displayed in Fig. 1.

Fig. 1
figure 1

The research process

RPA processes were selected across various industries: 10 out of 12 processes are from the banking industry, where interval workflows are executed within the bank infrastructure. The rest of the RPA processes are customized automations of HR processes working with attendance systems. The selected processes are from two RPA platforms, namely Blue Prism and UiPath. Out of the 12 processes, we selected two benchmark processes to serve as examples for comparison. These are processes which work perfectly, while the other RPA processes are very problematic in that they exhibit extensive error behavior which requires repair. These two processes served as benchmarks in the comparison with the other processes. These two benchmark processes came from the same bank, and both used the same RPA platform version from the same distributor. The non-problematic processes have been in operation for more than 2 years and have undergone multiple optimizations. The problematic processes are after a roll out to production in the hypercare phase of an RPA. That is why is they are problematic; the production environment can sometimes cause processes to behave differently than during the development environment.

Basic information about the processes is presented in Table 2. A more comprehensive description of the processes is prohibited by a non-disclosure agreement.

Table 2 Basic data on benchmark processes

3.1 Methodology for selecting indicators of statistical dispersion

There are surprisingly numerous methods for measuring variability and deviation from the norm. The most common and widely used measures in statistics are most likely standard deviation, variance, and coefficient of variation.

There is a significant difference between the results of these three coefficients. The standard deviation and variance always yield a result in absolute values, while the coefficient of variation yields a normalized result and can take on values from 0 to 1. Because RPA robots can take different amounts of time to process a case/item based on the number of steps in the process and the operations performed, the absolute values of duration change with it. It is not possible to compare two different processes based on standard deviation and variance. Hence, for our measurements, we will only select indicators that are normalized as a coefficient of variation. We show the difference between the results of absolute values and normalized values in Table 3.

Table 3 Difference between absolute value indicators and normalized indicators

3.2 Introduction of the indicators

To start with, Table 4 introduces concepts to understand the calculations of the selected indicators. The name is in the first column, the labeling in the second and the description is in the third column. Table 5 presents the selected indicators that have already been researched.

Table 4 List of values used for calculations
$${\sigma}=\sqrt{\frac{\sum_{\text{i}=1}^{\text{n}}|{\text{x}}_{\text{i}}-{\mu}{|}^{2}}{\text{n}}}$$
(1)
$$\mu=\frac{1}{n}\sum_{i=1}^{n}{x}_{i}$$
(2)
$$IQR={Q}_{3}-{Q}_{1}$$
(3)
$$Oc1SD=n<-SD\,{\wedge}\,n>SD$$
(4)
$$OcIQR=n<{Q}1-1.5\,*\,IQR\,{\wedge}\,{n}>Q3\,+\,1.5\,*\,IQR$$
(5)
Table 5 Selected indicators of statistical dispersion
$$CV=\frac{{\sigma}}{{\mu}}$$
(6)
$$CD=\frac{1}{\text{n}}*\frac{\sum_{i=1}^{n}|{x}_{i}-med|}{med}$$
(7)
$$CMD=\frac{1}{n}*\frac{\sum_{i=0}^{n}\left|{x}_{i}-{\mu}\right|}{\mu}$$
(8)
$${\text{C}\text{I}\text{Q}\text{R}}_{90}=\frac{{Q}_{0.95}-{Q}_{0.05}}{{Q}_{0.95}+{Q}_{0.05}}$$
(9)
$$GC=\frac{\sum_{i=1}^{n}\,\left(2i-n-1\right){x}_{i}}{n\sum_{i=1}^{n}{x}_{i}}$$
(10)
$$OoOS=\frac{\text{O}\text{c}1\text{S}\text{D}}{n}$$
(11)
$$OoIQR=\frac{\text{O}\text{c}\text{I}\text{Q}\text{R}}{n}$$
(12)
$$CR=\frac{H-S}{H+S}$$
(13)

To find the number of outliers used in OoIQR and OcIQR, the calculation that is used is also applied to display the outliers in the boxplot. Together, Oc1SD and OcIQR are only the number of values that exceeded a defined threshold, so they are not individual values.

3.3 Methodology of calculation used

We have prepared the data for every single process, where we only worked with the times required to perform certain cases (time to complete one item in a process). Thus, we collected data for every process and all the cases performed by one RPA robot (process) and the time needed to accomplish one item was extracted for every single item. Each process has a different measured number of cases processed depending on the bot’s workload. The case counts are always higher than 90 cases for a single RPA process, so the counts are sufficient for using descriptive statistical methods. A snippet of data can be seen in Table 6Footnote 1. The data in the table are rounded to seconds. For the calculation we have converted the time to seconds and rounded only the result of the calculation. The Python language and Jupyter notebooks were used for data analysis; in addition, a libraryFootnote 2 was programmed to calculate the above formulas. After calculating all the indicators, and in order to verify the accuracy of the prediction, we used the Pearson correlation coefficient to test the dependence between reality and the indicators.

Table 6 Sample data

Indicators such as CV, CD, CMD, CIQR90, and GC are less affected by outliers due to their calculation method, which involves either the use of mean values or a subset of the entire distribution. These indicators are better suited for assessing the overall behavior of a process and are more closely associated with H1. On the other hand, indicators like OoOS and OoIQR are directly linked to outliers, while CR is greatly influenced by them. These indicators are more relevant to H2.

4 Results

In Table 7, there are values from indicators for every process we have data on with basic information such as the success rate (SR) and the percentage of outliers outside the IQR out of the total (OoIQR). The results in Table 7 are rounded to 2 decimal places except for the SR indicator, where the accuracy is only in whole numbers.

Table 7 Values of statistical dispersion indicators

We used a correlation matrix to test our assumptions about which indicator best predicts error behavior as well as the number of outliers. The correlation matrix can be found in Fig. 2.

Fig. 2
figure 2

Correlation between reality and the indicator

The most interesting thing from the correlation matrix is that most of the variability indicators have a strong connection with the success rate; specifically, the coefficient of variation, the coefficient of dispersion, the coefficient of mean deviation and the Gini coefficient all exhibited a direct correlation with the success rate. The correlation matrix clearly shows that the coefficient of mean deviation (CMD) has the strongest relationship with the success rate at −0.92. Coefficients between the quantile range of 90% only have a moderate dependency. The coefficient of range and the number of outliers out of IQR show no dependency on the success rate. The number of outliers out of one sigma displays a low influence (−0.34) on the success rate. This interesting finding is related to Hypothesis two (H2). The success rate of an RPA process is not influenced, or at best weakly influenced, by outliers. The results also show that the average duration of processing cases/items also have a low impact (−0.31) on the success rate.

At the beginning of this study, we proposed a research question and then two hypotheses to frame the objective of the research:

RQ1

How do deviations and variability of RPA bots influence their behavior and performance?

Hypothesis 1

Variability is related to the performance of an RPA bot (success rate).

Hypothesis 2

Outliers are related to the performance of an RPA bot (success rate).

Hypothesis 1

Based on our results, we can say that variability is related to the performance of an RPA.

Hypothesis 2

Based on our results, we were not able to prove that outliers greatly influence the performance of an RPA bot; i.e., their influence is maximally weak.

Thus, as an answer to the RQ, we found that variability influences the behavior and performance of an RPA bot. Deviations (outliers) do not significantly influence the performance of RPA bots.

5 Discussion

In this research, we aimed to investigate the impact of deviations and variability in RPA processes on the performance of RPA bots. Our objective was to validate the assumption that measuring variability can offer valuable insights for managing RPA bots.

Process variability is a common phenomenon and does not only occur in RPA processes, but is also common in industries where accuracy matters much more, such as in chemistry, manufacturing, and the pharmaceutical industry (Munoz-Gama et al. 2022). There is also some variability in these industries, yet the variability is much smaller and this is due to the stringency and standards they are subject to (MacGregor and Kourti 1995). High variability, many outliers, and a distribution of values that is flatter than the normal distribution most likely indicate some problems in a process, regardless of the industry (Mapes et al. 2000; Munoz-Gama et al. 2022; van der Aalst 2016). Over time, approaches to measure and remove this variability have been developed. Examples of such approaches include Statistical Process Control, process capability, six sigma methods, and some process mining methods.

Our research focused primarily on variability development over time and more precisely, on the values of the duration of RPA bot case processing. Time variability is dealt with, for example, in manufacturing on production lines, where physical robots are used rather than software robots and it is necessary that all robots are synchronized to the production clock (Doyle Kent and Kopacek 2021). Even though RPA bots are software and one would expect variability to be machine accurate, this is not the case. Quite often, additional variables enter into the processing that can prolong or complicate it. These complications can also occur in normal business processes. Typical examples might be bad input data, a bad form, or a bad format. Also, atypical variations of the process that are not as frequently dealt with include, for example, payment in a currency other than the default currency or processing data from within a foreign country or outside the country’s economic space. Therefore, variability provides a good description of process behavior based on the input and the issues that affect it (Mapes et al. 2000; van der Aalst 2016). Variability can also give a good indication of the maturity of a process and whether it is suitable for automation, thus, assisting in the selection of processes for automation. It can also serve in process selection screening for task-mining methods (Leno et al. 2021).

To enhance the selection and description of process behavior and the impact of deviations and variability, modified methods of statistical dispersion can be employed. For instance, the coefficient of variation is more responsive to outliers compared to the coefficient of mean deviation. This implies that even if a process has a high success rate with no exceptions or errors, there may be occasional delays in the infrastructure where the robot operates, resulting in outliers. The coefficient of mean deviation (CMD), on the other hand, is less sensitive to outliers and provides a clearer picture of the primary process flow, which can be advantageous. Similarly, skewness can be used as an additional descriptive statistic to assess the distribution of process items over time.

Interestingly, our results indicate that deviations (outliers) do not impact the performance (success rate) of RPA bots. This sheds new light on deviations, which used to be a concern for RPA process owners. In comparison to manual processes, deviations and outliers are problematic. This behavior can be explained by the inherent nature of RPA processes. If there is a high error rate resulting from input data issues or problems in the robot’s flow, the process is likely to terminate within the expected time, generating fewer outliers.

However, the high variability in RPA processes supports the theory that variability is undesirable. Variability strongly correlates with performance across all indicators sensitive to variability. Therefore, variability can be an additional measure to gauge the success rate and identify issues in deployed RPA processes. If the success rate is low but the variability is not significant, there is likely a specific problem with the system, application, or input data format that the bot cannot handle, leading to consistent errors. Conversely, high variability and low success rate may be caused by sudden errors (system exceptions), indicating infrastructure issues or problems with highly variable input data that require improved data quality. These insights help in triaging RPA problems and reducing maintenance time by enabling faster problem investigation and appropriate assignment.

The variability of time poses challenges in complying with company standards, meeting service level agreements (SLAs), and achieving key performance indicators (KPIs). Process owners rely on RPAs to process customer demands and provide services. If RPAs are unreliable and the processing time varies from 2 min to 15 min, it can lead to discomfort for both customers and employees at the service counter. Consequently, RPAs fail to enhance the customer experience and may even make it worse. Time variability also complicates scheduling of RPA bots and causes delays in case/item processing, negatively impacting overall performance and utilization of the RPA bot portfolio.

Also, we should comment on the limiting factor of variability in RPA processes that is only partially reflected in this study. The limiting factor is the complexity of selected RPA processes. Unfortunately, we do not have access to the RPA codes, so we cannot clearly analyze how complex the processes are. We can only assume the complexity as an approximation based on time. More complex processes should take more time. From the results we know that time has a low correlation with the success rate and no association (expect CIQR90 - low correlation 0.32) to indicators of variability. Thus we assume that the time (approximation of complexity) does not directly influence the behavior of RPA bots.

Our research findings establish the basis for handling variability, triaging RPA problems, and gaining a better understanding of RPA bot behavior. This provides RPA developers with an additional investigative tool, reducing time spent on debugging and assisting in root cause analysis of low RPA bot performance and exceptions leading to termination. Variability provides insights into both RPA performance and the IT infrastructure of the company, including processing capacity, computation power, data quality, and overall performance. It aids in building a framework for seamless exception handling in RPA processes.

6 Conclusion

This study investigates the impact of deviations and variability on Robotic Process Automation (RPA), specifically examining their impact on the performance and behavior of RPA bots. Our findings reveal that variability strongly influences the performance and behavior of RPA bots. In contrast to the effects of variability on the behaviors and performance of RPA, the impact of deviations (outliers) on RPA are maximally weak. These findings are an intriguing contrast to established process theory. This research provides portfolio managers of RPA bots with an additional tool to debug them, identify the root causes of exceptions, and reduce the cost of repairs and RPA bot downtime. Moreover, the research serves as a basis for the automatic diagnostic of RPA bots.