Introduction

Workflow management is a key technology for the collaboration of various business processes, e.g., loan approval and customer order processing (Yin and Kosar 2011; Du et al. 2011, 2012; De Leoni et al. 2013; Hsieh and Lin 2014; Alotaibi 2016). By constructing process models and enacting them in a workflow server, workflow management system (WfMS) can help enterprises to streamline business processes, deliver tasks and documents among users, and monitor overall performance of the processes.

In a real business scenario, workflow processes usually contain various kinds of business data, such as product name, quantity of raw materials, amount of stock, etc. For example, there are two data-aware workflow processes: manufacturing screen (MS) and manufacturing motherboard (MM) (detailed information will be shown in “Motivation case” section). They generate business data during their execution, such as the quantity of raw material stock, the number of design drawing, and market demand information, etc.

Recently, the researches on data-aware workflow processes (Trčka et al. 2009; De Leoni et al. 2013, 2014a, b; Lu et al. 2014; Yin and Kosar 2011) have become a hotspot in the area of workflow management, since it can help the enterprises to analyze correctness and consistency, balance production capacity, make optimal schedule, and improve quality of products, etc. For example, Trčka et al. (2009) focus on the data-flow verification and propose an approach to discovering the data-flow errors in workflows. De Leoni et al. (2014a, b) use data Petri nets to model process and propose a technique to check and analyze the conformance of data-aware process models. Lu et al. (2014) propose a new hierarchical Petri net with data to model and verify emergency treatment processes. Yin and Kosar (2011) improve a heuristic-based data-aware algorithm to find the optimal scheduling so that the turnaround time of the workflow is minimized. De Leoni et al. (2013) present a novel technique for data-aware process mining which first align log and model, and then apply decision-tree learning algorithms.

Throughput of workflow processes (Sriram et al. 2013; Liu et al. 2008; Pacini et al. 2015; Othman et al. 2012; Zhang et al. 2015) is usually used to measure the number of activities completed in a time duration contributing to the completion of the entire batch of processes. It is critical for the schedule and optimization of workflow processes, especially for a real-time application with dynamic workloads.

For data-aware workflow processes, the throughputs can describe the completion time of activities and value created by the activities in a time duration. Actually, they reflect the workload and gross profit of enterprises over a period of time, respectively. With the concept of throughput, managers can balance the production capacity at each stage and determine how much capital should be recycled over a period of time. Therefore, analyzing throughputs is of great importance to the management of enterprises.

However, the existing research works on data-aware workflow processes (Trčka et al. 2009; De Leoni et al. 2014a, b; Lu et al. 2014; Yin and Kosar 2011; De Leoni et al. 2013) mainly focus on constructing structure models and analyzing conformance or compatibility of them. They don’t fully investigate the issue of throughputs for data-aware workflow processes.

There exists some research works related to the problem of analyzing throughput of workflow processes, and they can be classified into analysis of throughputs (Sriram et al. 2013; Liu et al. 2008; Pacini et al. 2015; Othman et al. 2012; Zhang et al. 2015; Liu et al. 2013, 2014) and simulation of workflow processes (Pla et al. 2014; Siller et al. 2008; Liu et al. 2012; Cheng et al. 2011; Catalano et al. 2009; Xia et al. 2012; Cheng et al. 2013).

Analysis of throughputs: Sriram et al. (2013) optimize the ordered throughput in order to improve the efficiency of workflow system. Liu et al. (2008) consider that the workflow throughput is the number of activities which have been completed in a time duration. Pacini et al. (2015) consider the number of serviced users in a time duration as throughput. Othman et al. (2012) propose a method to optimize the throughput which is the number of activities executed in a time duration for workflows. Zhang et al. (2015) propose a new iterative ordinal optimization (IOO) method which aims at generating more efficient schedules from a global perspective over a long period based on the throughput of workflow. Liu et al. (2013) propose a novel workflow throughput which can measure how much workflow activities completed in a time duration contributes to the completion of workflow system. Liu et al. (2014) measure the throughput of workflow by using simulation.

This kind of research works can be further divided into two aspects: activity based approach (Sriram et al. 2013; Liu et al. 2008; Pacini et al. 2015; Othman et al. 2012; Zhang et al. 2015) and completion time based approach (Liu et al. 2013, 2014). The former is not reliable for the analysis workflow processes, since the throughputs in activity based approach are used to measure the number of activities or workflows completed in a time duration. The throughputs in the later are not accurate, since they do not consider the situation that activities are not finished completely within a time duration. Furthermore, all of them do not consider the throughputs of data-aware workflow processes.

Fig. 1
figure 1

Skeleton of our approach

Simulation of workflow processes: Pla et al. (2014) construct the intelligent workflow management system through resource-aware Petri net, which can monitor the workflow and predict the delay of activities. Siller et al. (2008) propose a workflow model to help manufacturers exchange the information, and validate its efficiency by simulating the workflow process in real business environment. Liu et al. (2012) propose an approach of workflow simulation for operational decision support. Cheng et al. (2011) propose a general and extended Petri net model for virtual construction of earthmoving operations. And its correctness is verified by workflow simulation. Catalano et al. (2009) propose an ontology-based formalization of the product design process, which can be captured and shared by behavior simulation of the digital shape workflow. Xia et al. (2012) present a performance analysis method for the BPEL (business process execution language) process based on Petri net, and its feasibility and accuracy are proved by Monte Carlo simulations. Cheng et al. (2013) propose an information constraint Petri net for resource management modeling and simulation in the building design process based on CPN Tools (Jensen et al. 2007) platform.

The research works on simulation of workflow processes focuses on simulation and analysis of workflow processes. They do not consider data and the complex relations among them in workflow processes, so that they cannot model and simulate data-aware workflow processes. Furthermore, all of the existing methods above do not consider the throughputs of workflow processes from value aspect in real business environment (Wang et al. 2010), such as production cost, product value, and gross profit, etc. As we know that value factor is very important for the operation of enterprises which cannot be ignored, since the balanced capital chain can ensure the smooth execution of business processes and keep lean management for enterprises. Reconsider the two workflow processes mentioned above: manufacturing screen (MS) and manufacturing motherboard (MM), the value throughput of them reflects the profits created in a time duration. It can be used to determine how much capital should be recycled over a period of time based on changes of value throughput.

In this paper, we present a new approach to modeling and simulation of time and value throughputs of data-aware workflow processes. As is shown in Fig. 1, our approach is divided into the following stages. First of all, we construct an abstract model with time and value elements. Second, we transform the abstract model into a simulation model in CPN Tools. Finally, we obtain and analyze the time and value throughputs automatically via the simulation logs. Based on these two throughputs, we can help enterprises to balance production capacity at each duration as well as determine how much capital should be recycled over a period of time, respectively.

Compared with the existing works (Trčka et al. 2009; De Leoni et al. 2014a, b; Lu et al. 2014; Yin and Kosar 2011; De Leoni et al. 2013; Pla et al. 2014; Siller et al. 2008; Liu et al. 2012; Cheng et al. 2011; Catalano et al. 2009; Xia et al. 2012; Cheng et al. 2013), the contributions of this paper are shown as follows.

  1. (1)

    This is the first attempt to propose the time and value throughputs of data-aware workflow processes. Both of them are very important for the management and operation of enterprises, and they can be widely used in the field of management and operations research.

  2. (2)

    We propose the whole procedure of modeling and simulation of time and value throughputs of data-aware workflow processes.

  3. (3)

    The procedure of obtaining time and value throughputs through analyzing the logs is proposed, and a prototype system-throughput analyzer is designed and developed.

The rest of the paper is organized as follows. In “Motivation case and problem statement” section, motivation case and problem statement are proposed. “Constructing abstract model with time and value elements” section proposes an abstract model with time and value elements. “Transforming of TVWFD-net” section presents the procedure of transforming the abstract model to a simulation model in CPN Tools. “Obtaining and analyzing time and value throughputs” section presents the procedure of obtaining and analyzing time and value throughputs. “Comparison and discussion” section conducts a comparison between existing methods and our approach. “Conclusion” section concludes this paper.

Fig. 2
figure 2

MS described by UML activity diagram

Motivation case and problem statement

Motivation case

There are two workflow processes in an enterprise of manufacturing electronic equipment, i.e., manufacturing screen (denoted as MS) and manufacturing motherboard (denoted as MM).

As is shown in Fig. 2, MS begins with receiving a screen order (Receives orders). Then, three concurrent activities Check stock, Design product, and Analyze requirement execute, and they generate the data quantity of raw material stock (denoted as \(v_{1})\), number of design drawing (denoted as \(v_{2})\) and market demand information (denoted as \(v_{3})\), respectively. After that, these data are summarized (Summarize data). After summarizing the data, the enterprise executes activity Monitor the production process. At the same time, one of the three activities Manufacture (material is sufficient), Purchase materials (material is shortage), and Outsource production also executes according to the following three cases: (1) If \(v_{1}\) is more than 1000, \( v_{2}\) is 1, and \(v_{3}\) is less than 5000, the process of MS executes the activity Manufacture (material is sufficient). (2) If \(v_{1}\) is less than or equals 1000, \( v_{2}\) is 2, and \(v_{3}\) is less than 5000, the process of MS executes the activity Purchase materials (material is shortage), and then the activity Manufacture executes. (3) Otherwise, the activity Outsource production executes. Finally, the MS checks, packs, and transports products (i.e., activities Checks products and Packs and transports products execute).

Fig. 3
figure 3

Process of MM described by UML activity diagram

As is shown in Fig. 3, MM starts when the enterprise receives a motherboard order (Receive orders). Then two concurrent activities Analyze requirement and Check stock execute, and they generate the data market demand information (denoted as \(v_{4})\) and quantity of products in stock (denoted as \(v_{5})\), respectively. Then, these data are summarized and analyzed (Summarize data). After summarizing the data, the enterprise executes one of the three activities Outsource production, Prepare manufacture, and Pick up products from stock (stock is sufficient) based on the following three cases: (1) If \(v_{4}\) is less than 1000 and \(v_{5}\) is less than or equals 500, the process of MM executes the activity Prepare manufacture. (2) If \(v_{4}\) is less than 1000 and \(v_{5}\) is more than 500, the process of MM executes the activity Pick up products from stock, and then the MM executes the activity Produce core component and Produce peripheral components. (3) Otherwise, the process of MM executes the activity Outsource production. Finally, the MM checks, packs, and transports products (i.e., activities Checks products and Packs and transports products execute).

Note that there are two resources between them: a market analyst and a quality checking machine. The activity Analyze requirement in MS and the activity Analyze requirement in MM share a human resource, i.e., market analyst. The activity Check products in MS and the activity Check products in MM need the same quality checking machine to check the products.

For MS and MM, an important aspect for the managers to monitor the execution of data-aware workflow processes comprehensively is to compute their time and value throughputs, since they can help enterprises to balance production capacity at each stage and determine how much capital should be recycled over a period of time, respectively.

However, the existing research works cannot obtain both time and value throughputs of data-aware workflow processes. Thus, modeling and simulation of time and value throughputs of data-aware workflow processes is of great importance to the operation of MS and MM.

Problem statement

The problems that we focus on in this paper can be described as follow:

Given multiple data-aware workflow processes which collaborate with each other.

  1. (1)

    How to construct an abstract model with time and value throughputs and transform it into a simulation model?

  2. (2)

    How to analyze time and value throughputs obtained via the simulation logs automatically so as to balance the production capacity at each stage and determine how much capital should be recycled over a period of time?

Constructing abstract model with time and value elements

First of all, we explain the following abbreviations or symbols in Table 1, which will be used in the rest of this paper.

Table 1 Explanation of abbreviations or symbols

Data-aware workflow processes

In this subsection, we present the definition of workflow net with data to formalize data-aware workflow processes.

Definition 1

(Workflow net,WF-net) (Shojafar et al. 2015; Pla et al. 2014; Du et al. 2011, 2014; Zhang and Jiao 2009; Li et al. 2015, 2016): A PN is called a WF-net only if : (1) PN has two special places: \(\varepsilon \) and \(\theta \). Place \(\varepsilon \) is a source, and \(^{{\bullet }}{\varepsilon }=\emptyset \). \(\theta \) is a sink, and \(\theta \)\(^{{\bullet }}=\emptyset \). (2) If we add a new transition to PN which connects place \(\theta \) with \(\varepsilon \), i.e.,\(^{ {\bullet }}t\)={\(\theta \)}, \(t^{{\bullet }}\)={\(\varepsilon \)}, then the resulting PN is strongly connected.

Definition 2

(Workflow net with data, WFD-net) (Trčka et al. 2009; De Leoni et al. 2014a): A WFD-net is a 9-tuple= (\(P, T, F, D, g_{D}\), Read, Write, Destroy, Guard), where:

  1. (1)

    (PTF) is a WF-net in Definition 1;

  2. (2)

    D is a set of data elements;

  3. (3)

    \(g_{D}\) is a set of guards over D;

  4. (4)

    Read: \(T \rightarrow 2^{D}\) is the reading data labeling function;

  5. (5)

    Write: \(T \rightarrow 2^{D}\) is the writing data labeling function;

  6. (6)

    Destroy: \(T\rightarrow 2^{D}\) is the destroying data labeling function;

  7. (7)

    Guard: \(T\rightarrow g_{D}\) is the guarding function, assigning guards to transitions.

Then, in order to describe time and value information of activities (transitions), we extend WFD-net to time and value workflow net with data (TVWFD-net) as follow.

Definition 3

(Time and value workflow net with data,TVWFD-net): A TVWFD-net is an 11-tuple=(\(P, T, F, D, g_{D}\), Read, Write, Destroy, Guard, TI, VA), where:

  1. (1)

    (\(P, T, F, D, g_{D}\), Read, Write, Destroy, Guard) is a WFD-net in Definition 2;

  2. (2)

    TI is a set of execution time distribution related to each transition;

  3. (3)

    VA is a set of value distribution related to each transition to represent the cumulative value.

For example, the TVWFD-net of MS and MM in Figs. 2 and 3 are shown in Fig. 4. The detailed information of transitions is shown in Table 2. We use hour as the unit of time information for an activity, and use thousand yuan (RMB) as the unit of value information for an activity.

Fig. 4
figure 4

The TVWFD-net of MS and MM

Table 2 Descriptions of transitions in Fig. 4

As shown in Table 2, there are two kinds of distribution for time and value information: Uniform distribution and Normal distribution. For example, time information \({N}\sim \) (22, 1) of \({t}_{22}\) follows the Normal distribution, and value information [6, 7] of \({t}_{22}\) follows the Uniform distribution.

In MS, data \(v_{1}\), \(v_{2}\), and \(v_{3 }\)(in Table 3) generated by transition \(t_{12}\), \(t_{13}\), and \(t_{14}\) will be used by transition \(t_{16}\), \(t_{17}\), and \(t_{19}\) (in Table 4), respectively. In MM, transition \(t_{22}\) and \(t_{23}\) will generate data \(v_{4}\) and \(v_{5}\) (in Table 3), and they will be used by transition \(t_{25}\), \(t_{29}\), and \(t_{210}\) (in Table 4), respectively.

Table 3 Data information in Fig. 4

Throughput of workflow processes

The traditional throughput presented in activity based approach (Sriram et al. 2013; Liu et al. 2008; Pacini et al. 2015; Othman et al. 2012; Zhang et al. 2015) is used to measure the number of activities completed in a time duration. This is critical for the schedule and optimization of workflow processes, especially for a real-time application with dynamic workloads. However, it does not distinguish activities with different execution time so that it suffers from the problem of inaccurate for the workflow processes whose activities vary greatly in execution time.

To solve this problem, a novel throughput of workflow process is proposed in completion time based approach by Liu et al. (2013) and Liu et al. (2014). It is used to measure how much completion time of activities that have been completed in a time duration. It calculates completion time of activities instead of the number of activities as throughput for workflow processes. The definition of this throughput is shown as follow.

Definition 4

(Throughput of workflow processes) (Liu et al. 2013, 2014): Given a batch of q parallel workflow processes {\(WF_{1}\), \(WF_{2}\), ..., \({WF}_{q}\)} which start at system time \(S_{0}\), the completion of a workflow activity \(a_{i}\) contributes to the entire batch of workflows with a value of \(w_{i}M(a_{i})\)/T where \(T=\mathop \sum \limits _{i=1}^q M\left( {WF_i } \right) \), \(M(a_{i})\) is the mean duration of \(a_{i}\) and \(M(WF_{i})\) is the mean completion time of \(WF_{i}\). Here, assume that at the current observation time point \(S_{t}\), the set of new completed activities from the last nearest observation time point \(S_{t-1}\) is denoted as \(a\{\}|_{S_{t-1} }^{S_t } \), then the current system throughput is defined as \(TH|_{S_{t-1} }^{S_t } =W\times M(a\{\}|_{S_{t-1} }^{S_t } )/T\) where W is an array of \(w_{i}\).

However, Definition 4 only considers activities completed in a time duration, but it does not consider the situation that activities are not finished completely in a time duration. Furthermore, as an important aspect of workflow processes, value is ignored by Liu et al. (2013) and Liu et al. (2014). Thus, they cannot measure value throughput of data-aware workflow processes, i.e., TVWFD-net.

Table 4 Guards of transitions in Fig. 4

Time and value throughputs of TVWFD-net

Then, we propose the concepts of time throughput and value throughput of TVWFD-net in this subsection.

Definition 5

(Time throughput of TVWFD-net) Time throughput is the total execution time of activities of TVWFD-net in a time duration. The calculation formula is \(\left. {TH_{time} } \right| _{tp_i }^{tp_{i+1} } =T\left( {\left. {t_x \{\}} \right| _{tp_i }^{tp_{i+1} } } \right) +T\left( {\left. {t_y \{\}} \right| _{tp_i }^{tp_{i+1} } } \right) \), where

  1. (1)

    \(\left. {t_x \{\}} \right| _{tp_i }^{tp_{i+1} } \) is a set of activities completed between time point \(tp_{i}\) and \(tp_{i+1}\);

  2. (2)

    \(T\left( {\left. {t_x \{\}} \right| _{tp_i }^{tp_{i+1} } } \right) \) is the total execution time of the activities in \(\left. {t_x \{\}} \right| _{tp_i }^{tp_{i+1} } \);

  3. (3)

    \(\left. {t_y \{\}} \right| _{tp_i }^{tp_{i+1} } \) is a set of activities which are not finished completely between time point \(tp_{i}\) and \(tp_{i+1}\);

  4. (4)

    \(T\left( {\left. {t_y \{\}} \right| _{tp_i }^{tp_{i+1} } } \right) \) is the total execution time of the activities in \(\left. {t_y \{\}} \right| _{tp_i }^{tp_{i+1} } \).

Note that, in Definition 5\(T\left( {\left. {t_y \{\}} \right| _{tp_i }^{tp_{i+1} } } \right) \) is the total execution time of the activities which are not finished completely between time point \(tp_{i}\) and \(tp_{i+1}\), which is different from Definition 4. This makes a contribution to calculate precise time throughput.

Definition 6

(Value throughput of TVWFD-net) Value throughput is the total value of activities for TVWFD-net in a time unit. And the calculation formula is \(\left. {TH_{value} } \right| _{tp_i }^{tp_{i+1} } =V\left( {\left. {t_x \{\}} \right| _{tp_i }^{tp_{i+1} } } \right) +W\times V\left( {\left. {t_y \{\}} \right| _{tp_i }^{tp_{i+1} } } \right) \), where

  1. (1)

    \(V\left( {\left. {t_x \{\}} \right| _{tp_i }^{tp_{i+1} } } \right) \) is the total value created by the activities in \(\left. {t_x \{\}} \right| _{tp_i }^{tp_{i+1} } \);

  2. (2)

    \(V\left( {\left. {t_y \{\}} \right| _{tp_i }^{tp_{i+1} } } \right) \) is the total value created by the activities in \(\left. {t_y \{\}} \right| _{tp_i }^{tp_{i+1} } \);

  3. (3)

    W denotes the completion rate set of the activities in\(\left. {t_y \{\}} \right| _{tp_i }^{tp_{i+1} } \).

Fig. 5
figure 5

Illustration of time and value throughputs

If time and value throughputs are provided to enterprises, there will be at least two aspects of benefits as follows:

  1. (1)

    Time throughput is helpful for the analysis and handling of workflow processes. It reflects actual workload of an enterprise in a future period of time. With the exact time throughput, managers can balance the production capacity of each stage, e.g., reduce the number of stuffs and machines when receiving few product orders so as to avoid unnecessary fund consumption.

  2. (2)

    Value throughput can facilitate workflow processes to execute smoothly. The changes of value throughput reflect the fluctuations of the gross profit. The managers can determine how much capital should be recycled in a future period of time based on the changes of value throughput which can make the enterprise keep lean management with a balanced capital chain.

For example, Fig. 5 illustrates a small segment of executing instances of TVWFD-net in Fig. 4, where gray boxes represent activities, and dotted boxes imply the waiting time of instances. Three tuples with parentheses labeled with a gray box indicate the activity name, execution time, and value, respectively. Horizontal axis is the timeline of execution, and the vertical axis is instances.

The following are the time and value throughputs of time duration 1, 2, and 3 by using the Definitions 5 and 6, respectively.

\(\left. {TH_{time} } \right| _0^{timepoint_1 } =75+35=110,\) where \(T\left( {\left. {t_x \{\}} \right| _o^{timepoint_1 } } \right) =75\),\(T\left( {\left. {t_y \{\}} \right| _o^{timepoint_1 } } \right) =35\).

\(\left. {TH_{value} } \right| _0^{timepoint_1 } =22+10.12=32.12,\) where \(V\left( {\left. {t_x \{\}} \right| _o^{timepoint_1 } } \right) =22\), \(W\times V\left( {\left. {t_y \{\}} \right| _o^{timepoint_1 } } \right) =10.12\).

In the same way, \(\left. {TH_{time} } \right| _{timepoint_1 }^{timepoint_2 } =65\), \(\left. {TH_{value} } \right| _{timepoint_1 }^{timepoint_2 } =232.51\),\(\left. {TH_{time} } \right| _{timepoint_2 }^{timepoint_3 } =63\) and \(\left. {TH_{value} } \right| _{timepoint_2 }^{timepoint_3 } =129.36\). Finally, the computing results are in the second and third column in Table 5, respectively.

Furthermore, we also apply completion time based approach (in Definition 4) to time duration 1, 2, and 3, respectively. And its computing results are in last column of Table 5. Obviously, the throughputs of time durations are not accurate because they do not consider that the activities may be not finished completely with in a time duration. Thus, its result cannot provide useful guidance to the management of workflow processes.

Table 5 Throughputs for the example in Fig. 5

Transforming of TVWFD-net

In this section, we present the procedure of transforming TVWFN-net into a simulation model in CPN Tools so as to obtain the time and value throughputs, respectively.

Analyzing tasks in transforming of TVWFD-net

There are lots of simulation tools which can be used to simulate TVWFD-net, such as CPN Tools, WorkflowSim, Simprocess, Tina and PIPE, etc. In this paper, we use CPN Tools (Jensen et al. 2007) to simulate TVWFD-net. Noted that our approach can be applied to other simulation software or tools widely.

There are four tasks to be done when transforming TVWFD-net into a simulation model of CPN Tools:

Task 1: The executing instances in the same or different TVWFD-net follow first-come first-served (FCFS) policy to allocate resources. How to construct the structure which is involved in this policy in a simulation model?

Task 2: When there is a parallel structure (composed by an AND-split and an AND-join structure) in the TVWFD-net, the instances, i.e., tokens will execute concurrently. How to construct them in a simulation model so as to ensure that merging tokens in the AND-join place of parallel structure belong to the same instance?

Task 3: For writing, reading, and destroying data functions of TVWFD-net, how to realize them in a simulation model of CPN Tools?

Task 4: For guarding function of TVWFD-net, how to transform it into a simulation model?

Solution of task 1

In order to solve task 1, we present the following Algorithm 1, which adopts the queuing theory in a simulation model.

Algorithm 1: Solution to meet the requirement of FCFS

Input: The structure of FCFS in TVWFD-net.

Output: The structure corresponding to FCFS in a simulation model.

   Step 1: Based on the structure of FCFS in TVWFD-net, construct the related transitions, places, and arcs in CPN Tools.

   Step 2: In the CPN Tools, extract the transition \(t_{j}\) which is involved in FCFS.

   Step 3: Add a queue model between \(t_{j}\) and its pre-transition.

   Step 4: Return the transformed structure corresponding to FCFS in a simulation model.

For example, Fig. 6a is a structure which is involved in FCFS policy in a TVWFD-net. There is only one token in place \(p_{4}\). Namely, \(t_{2}\) only executes an instance at the same time. Because the instance in this TVWFD-net should meet the requirement of FCFS, the enabled tokens generated by \(t_{1}\) in the place \(p_{2}\) should queue. As is shown in Fig. 6b, this structure is transformed to the corresponding part in a simulation model, and the tokens of place \(p_{2}\) are waiting in line based on the order of arrival.

Solution of task 2

In a parallel structure, there are multiple tokens in an inputting place where transitions merge, a simulation model extract randomly multiple tokens by default from the inputting place. This will lead to the confusion of tokens after merging when different instances execute. In order to solve task 2, we present the following Algorithm 2.

Algorithm 2: Solution to address the problem of token confusion caused by parallel structure

Input: A parallel structure in TVWFD-net.

Output: The structure corresponding to the parallel structure in a simulation model.

   Step 1: Based on the parallel structure of TVWFD-net, construct AND-split transition \(t_{i}\) in CPN Tools.

   Step 2: Construct two virtual transitions in each split path of \(t_{i}\), and add queuing model to this two virtual transitions.

   Step 3: Construct actual execution transition in the split path behind each queuing model.

   Step 4: Construct AND-join transition \(t_{j}\).

   Step 5: Construct a virtual transition behind the actual execution transition of each split path, and add queuing model to actual execution transition and virtual transition.

   Step 6: Connect \(t_{j}\) with the queuing model constructed in Step 5.

   Step 7: Return the transformed structure corresponding to parallel structure in a simulation model.

For example, Fig. 7a is a parallel structure of a TVWFD-net and Fig. 7b is its corresponding transformed structure in a simulation model of CPN Tools. The parallel structure in a simulation model of CPN Tools cannot add queuing structure directly at the AND-split transitions and AND-join transitions. Thus, we construct virtual transitions. \(t_{1a}\), \(t_{1b}\), \(t_{1c}\), \(t_{1d}\), \(t_{2a}\), and \(t_{3a}\) are virtual transitions, which have ability to send the message of tokens after \(t_{1}\), \(t_{2}\), and \(t_{3}\) are enabled. And we construct two queuing structures in each path for this parallel structure.

Solution of task 3

In order to solve task 3, i.e., writing, reading, and destroying data in a simulation model, we define the colored sets of related data and data variables in CPN Tools at first. Then, we add related code in a simulation model to implement these three functions above. Algorithm 3 is presented to solve task 3 above.

Algorithm 3: Solution to write, read, and destroy data

Input: Writing, reading and destroying data functions in TVWFD-net.

Output: The structure corresponding to writing, reading and destroying data functions in the simulation model.

   Step 1: Define the colored sets of data (colset V), instances colored sets (colset W) and product colored sets (colset V). Then, define data variable (var v: V).

   Step 2: Analyze the types of transition. If the transition can write data, then go to Step 3; if the transition can read data, then go to Step 4; if the transition can destroy data, then go to Step 5.

   Step 3: Construct the structure of TVWFD-net in CPN Tools. Then, add W to the inputting arc of transition, add (W, v) to the outputting arc of transition. Finally, in order to guarantee that the transition can output data variable v, add related code in this transition. Go to Step 6.

   Step 4: Construct the structure of TVWFD-net in CPN Tools. Add a virtual transition to the transition which can read and write data. And then add writing data code to the virtual transition, thus this virtual transition can write data. Go to Step 6.

   Step 5: Construct the structure of TVWFD-net in CPN Tools. Add (W, v) to the inputting arc of transition, and add W to the outputting arc of transition.

   Step 6: Return the transformed structure corresponding to writing, reading and destroying data functions in a simulation model.

Fig. 6
figure 6

Transformation of FCFS policy. a The structure of FCFS in TVWFD-net. b The transformed structure in CPN Tools

Fig. 7
figure 7

Transformation of parallel structure. a A parallel structure in TVWFD-net. b The transformed structure in CPN Tools

For example, data \(v_{1}\) is generated after \(t_{1}\) fires in Fig. 8a. In order to write data, we first define the colored set of data \(v_{1}\) as “colset V= int with 1..10” in Fig. 8b. We also define the colored set of instance as “colset \(W_{1}\) = with \(W_{1}\)”, and define product colored set as “colset \(W_{1}\times V \)= product \(W_{1}\times V\)”. After defining colored sets, we define the variable of data \(v_{1}\) as “var \(v_{1}\): V”, thus data message can be send by arcs. Then, the transformed structure in CPN Tools is shown in Fig. 8b.

For example, in Fig. 9a the transition \(t_{2}\) reads data \(v_{1}\) and \(v_{2}\) generated during process execution before it has been fired. In order to read data, we first define the colored sets of data \(v_{1}\) and \(v_{2}\) as “colset \(V_{1}\)=int with 1..10” and “colset \(V_{2}\)=int with 10..20”, respectively. We also define the colored set of instance as “colset \(W_{1}\)=with \(W_{1}\)”, and define product colored set as “closet \(W_{1}\times V_{1}\times V_{2}\) = product \(W_{1}\times V_{1}\times V_{2}\)”. After defining colored sets, we define the variable of data \(v_{1 }\)as “var \(v_{1}\), \(v_{1}\)_: \(V_{1}\)” and the variable of data \(v_{2}\) as “var \(v_{2}\), \(v_{2}\)_: \(V_{2}\)”. Then, we obtain the transformed structure in CPN Tools of Fig. 9b.

Fig. 8
figure 8

Transformation of writing data function. a A writing data structure in TVWFD-net. b The transformed structure in CPN Tools

For example, in Fig. 10a the transition \(t_{1}\) destroys data \(v_{1}\) after it was enabled. In order to destroy data, we define the colored set of data \(v_{1}\) as “colset \(V_{1}\)= int with 1..10”. We also define the colored set of instance as “colset \(W_{1}\) = with \(W_{1}\)”, and define product colored set as “colset \(W_{1}\times V \)= product \(W_{1}\times V\)”. After defining colored sets, we define the variable of data \(v_{1}\) as “var \(v_{1}\): V”. Then, the transformed structure model is shown in Fig. 10b.

Solution of task 4

In order to solve task 4, i.e., realize the guarding function of TVWFD-net in a simulation model. First, we define the colored sets, data variables and guard functions of related data. Then, we add several transitions, places and arcs in the work area of simulation model, and invoke guard functions predefined. Algorithm 4 is presented to solve task 4 above.

Algorithm 4: Solution to realize the guarding function

Input: A guarding function structure in TVWFD-net.

Output: The structure corresponding to the guarding function in a simulation model.

   Step 1: Invoke Step 4 in Algorithm 3 to add the structure model of reading data.

   Step 2: Define guard functions Guard (), the colored sets of path colset T = int with 1, 2, ..., n and the colored sets of split path colset \(W\times T\) = product \(W\times T\) in CPN Tools.

   Step 3: Add guard functions in the outputting arcs of virtual transitions under the model of Step 1. Add (W, 1), (W, 2) \(\ldots \)\(\ldots \), (W, n) to the outputting arcs of place, which is the inputting one of the arc with functions Guard ().

   Step 4: Return transformed structure corresponding to guarding function in a simulation model.

For example, as is shown in Fig. 11a, transitions \(t_{2}\) and \(t_{3}\) read data \(v_{1}\) and \(v_{2}\) which are generated during process execution, respectively. Based on guarding function guard (\(t_{2})\) and guard (\(t_{3})\), either \(t_{2}\) or \(t_{3}\) will be fired after the process execution. First, we define guarding function Guard (), the colored set of path as “closet T= int with 1, 2,..., n” and the colored set of selective path as “colset \(W_{1}\times T \)= product \(W_{1}\times T\)”. Then, add virtual transition \(t_{4}\). In order to control the execution of process conveniently, guard (\(t_{2})\) and guard (\(t_{3})\) will be coded in a guarding function Guard (). The codes of Guard () are described as shown in the following black box, where the transformation of guard (\(t_{2})\) and guard (\(t_{3})\) should correspond to actual situation. Finally, we can obtain the transformed structure in CPN Tools shown in Fig. 11b.

figure a

The whole procedure of transforming TVWFD-net

Based on the discussion above, we propose Algorithm 5 to transform TVWFD-net into a simulation model of CPN Tools automatically.

Algorithm 5: Transform TVWFD-net into simulation model

Input: TVWFD-net.

Output: Simulation model in CPN Tools.

   Step 1: Define the color sets of process \(P_{i}\) (i=1, 2, ..., n), resource \(R_{j}\) (j=1, 2, ..., m), execution time, value and data according to TVWFD-net. Then define the corresponding variables based on the color sets.

   Step 2: Obtain the transition set \(\Pi \)= {\(t_{1}\), \(t_{2}\), ...,\( t_{a}\)}, the place set \(\Psi \)={\(p_{1}\), \(p_{2}\), ..., \(p_{b}\)} and the relationship set \(\Omega \) from TVWFD-net. According to \(\Pi \),\(\Psi \) and \(\Omega \), create transitions T, places P and arcs F. Generate the structure of simulation model \(\Delta \) by composing T, P and F.

   Step 3: Invoke Algorithm 1. Assure that executing instances in the same or different TVWFD-net follow FCFS policy when they share resource.

   Step 4: Invoke Algorithm 2. Thus, tokens merging in the AND-join place of parallel structure belong to the same instance.

   Step 5: Invoke Algorithm 3 to write data, read and destroy data in TVWFD-net.

   Step 6: Invoke Algorithm 4 to realize the guarding function of TVWFD-net.

   Step 7: Create transitions \(T_{generate}\) and places \(P_{generate}\). And connect \(T_{generate}\) and \(P_{generate}\) with arcs \(F_{generate}\). Define the generate functions to generate the workflow instances. The generate functions are invoked by arcs \(F_{generate}\). Then return a simulation model.

Fig. 9
figure 9

Transformation of reading data function. a A reading data structure in TVWFD-net. b The transformed structure in CPN Tools

Fig. 10
figure 10

Transformation of destroying data function. a A destroying data structure in TVWFD-net. b The transformed structure in CPN Tools

Fig. 11
figure 11

Transformation of guarding function. a A guarding function structure in TVWFD-net. b The transformed structure in CPN Tools

According to Algorithm 5, a simulation model in CPN Tools can be obtained automatically. In Algorithm 5, Step 1 aims to define the color sets and variables in CPN Tools. Steps 2–6 are to construct the simulation model according to the TVWFD-net. Step 7 is to generate workflow instances in simulation model in CPN Tools.

Obtaining and analyzing time and value throughputs

In this section, we present the procedure of obtaining time and value throughputs through analyzing the logs in CPN Tools. Then, we illustrate the analyzing procedure by motivation case in “Motivation case and problem statement” section.

Obtaining time and value throughputs

After running a simulation model by CPN Tools, a simulation log is saved automatically for users. Then we need to extract useful records from the simulation log since it includes all data related to the operation of simulation model, while only a little part are useful for computing the throughputs. However, the records produced during simulation are very complex. The skeleton of obtaining time and value throughputs by simulation logs as is shown in Fig. 12. Then, its automatic procedure is proposed in Algorithm 6.

Algorithm 6: Obtain time and value throughputs by simulation logs

Input: The set of simulation logs SLs and the length of time duration Dur.

Output: Time and value throughputs.

   Step 1: Input the length of time duration. Then, according to the simulation logs whose time period is the longest in SLs, extract the set of time duration Durs.

   Step 2: For \(\forall \)\(Dur_{i} \in \)Durs (1\(\le i\le n)\), extract the initial time duration \(Dur_{i}\), also the start time ST and end time ET of the \(Dur_{i}\). Then Durs\(\leftarrow \)Durs-{Dur}.

   Step 3: For \(\forall \)SL\(_{j} \in \)SLs (1\(\le j\le m)\), extract the initial simulation log SL\(_{j}\). Then SLs\(\leftarrow \)SLs-{SL\(_{j}\)}.

   Step 4: Read the simulation log SL\(_{j}\) whose format is .txt. Then search the enabled time \(T_{eti}\) and firing time \(T_{fti}\) of transition \(t_{i}\) in SL. If ST\(\le T_{eti}\le \)ET and ST\(\le T_{fti}\le \)ET, \(t_{i}\) is completed during the time duration denoted as \(t_{x}\). If \(T_{eti}\le \)ST or ET\(\le T_{fti}\), \(t_{i}\) is not finished completed during the time duration denoted as \(t_{y}\). Collect transitions \(t_{x}\) and \(t_{y}\), then create transition sets \(T_{x}\) and \(T_{y}\).

   Step 5: Extract the enabled time \(T_{et}\), firing time \(T_{ft}\), execution time \(T_{ext}\) and value \(V_{v }\)of the transitions in \(T_{x}\) and \(T_{y}\) from the information set IS in SL\(_{j}\). Then create useful information sets UIS\(_{x}\) and UIS\(_{y}\).

   Step 6: Calculate the time and value throughputs of time duration \(Dur_{i}\) for SL\(_{j}\). Gather the transition sets \(T_{x}\) and \(T_{y}\), and the corresponding information sets UIS\(_{x}\) and UIS\(_{y}\). Then invoke the calculation formulas in Definition 5 and Definition 6, respectively.

   Step 7: If SLs =\(\emptyset \), calculate the time and value throughputs of the initial time duration in simulation logs. Then, go to Step 2. Otherwise, go to Step 3.

   Step 8: If Durs =\(\emptyset \), the algorithm ends and output the throughputs. Otherwise, return to Step 2.

In Algorithm 6, Steps 1–2 is to get the start and end time durations. Steps 3–7 are to calculate the time and value throughputs of time duration. Step 8 is to obtain the throughputs of all time durations.

Then, we design and develop a prototype system-throughput analyzer to extract useful data from the simulation log so as to compute time and value throughputs. Throughput analyzer is developed using C# language. It can calculate the time and value throughputs of the time durations whose length are set by users. Fig. 13 shows the screenshot of throughput analyzer. It can be used by the following three steps: (1) Select the simulation log to be analyzed. If there is a simulation log, click “Select a log” button. If there are multiple simulation logs, select the folders where the simulation logs are located, click “Select logs” button. (2) Set the length of time durations. Then throughput analyzer can determine how many time durations are needed based on the length of time durations. (3) Click “Output the time and value throughputs” button, the throughput and value throughput in each time durations are displayed in the throughput analyzer.

Analysis of results

In this subsection, we take the motivation case in “Motivation case and problem statement” section as an example to illustrate our approach. Due to the different sales frequency of a year in a manufacturing enterprise, there are two sale conditions for MS and MM: normal and peak seasons. The settings of normal and peak season are assumed as follows:

  1. (1)

    Normal season: The time between two arrivals of orders has a mean of 100 h in MS and 72 h in MM. The inter arrival time has an exponential distribution with parameter \(\lambda \)=1/100 in MS and \(\lambda \)=1/72 in MM.

  2. (2)

    Peak season: The time between two arrivals of orders has a mean of 30 h in MS and 40 h in MM. The inter-arrival time has an exponential distribution with parameter \(\lambda \)=1/30 in MS and \(\lambda \)=1/40 in MM.

For each above season, we first obtain the transformed simulation model of TVWFD-net based on the Algorithm 5. Then, we run it in CPN Tools, and use the prototype system-throughput analyzer to obtain the time and value throughputs of normal and peak season, respectively. Here, we set the length of time duration is 3 days (72  h).

Analyzing results of normal season: As is shown in Fig. 14, we obtain the time and value throughputs of 30 time durations for normal season. The mean and maximum time throughputs of normal season are 103 and 135 (h), respectively. And the mean and maximum value throughputs of normal season are 203 and 251 thousand yuans, respectively.

In the normal season, we assume that the maximum capacity of manufacturing enterprise in the labor force is 125 (h), in terms of cost is 248 thousand yuans. According to the simulation results shown in Fig. 14, in the fifth durations, twentieth durations and twenty-second durations, the time throughputs are 128, 128 and 134 (h), respectively. This enterprise needs to increase the input of labor force for one day in advance to achieve the balance between supply and demand, and to ensure the smooth production. In the sixteenth durations and thirteenth durations, the value throughputs are 249 and 251 thousand yuans, respectively. The manufacturing enterprise needs to prepare some of the funds in advance, so as to ensure the balance of payments, and avoid the rupture of the capital chain.

Fig. 12
figure 12

The procedure of obtaining time and value throughputs described by UML activity diagram

Analyzing results of peak season: As is shown in Fig. 15, we obtain the value throughputs of 30 time durations of peak season. The mean and maximum time throughputs of peak season are 237 and 260, respectively. And the mean and maximum value throughputs of peak season are 421and 464 respectively.

In the peak season, we assume that the maximum capacity of enterprises in the labor force is 255 and cost is 460. According to the simulation results shown in Fig. 15, in the twentieth duration, the time throughput is 260. This enterprise needs to increase the inputs of labor force in advance to ensure that the production of resources meet market demand. In the twenty-sixth and sixteenth duration, the value throughputs are 461 and 464. At this time, it is necessary to receive the returns from clients in advance, so as to ensure the balance of revenue, and maintain the normal operation of the enterprise.

Fig. 13
figure 13

Screenshot of throughput analyzer

Fig. 14
figure 14

The time and value throughputs of normal season with 3 days length of time duration. a Time throughput. b Value throughput

Fig. 15
figure 15

The time and value throughputs of peak season 3 days length of time duration. a Time throughput. b Value throughput

Fig. 16
figure 16

Throughputs for the activity based approach. a Throughput of normal sea. b Throughput of peak season

Fig. 17
figure 17

The difference during normal season with 3 days length of time duration. a Difference of time throughput, b Difference of value throughput

Fig. 18
figure 18

The difference during peak season with 3 days length of time duration. a Difference of time throughput. b Difference of value throughput

Fig. 19
figure 19

The difference during normal season with 4 days length of time duration. a Difference of time throughput. b Difference of value throughput

Fig. 20
figure 20

The difference during peak season with 5 days length of time duration. a Difference of time throughput. b Difference of value throughput

Comparison and discussion

As we discussed above, the time and value throughputs proposed in this paper are more accurate and efficient than the traditional throughputs. Activity based approach (Sriram et al. 2013; Liu et al. 2008; Pacini et al. 2015; Othman et al. 2012; Zhang et al. 2015) and completion time based approach (Liu et al. 2013, 2014) cannot directly be applied to the problem of analyzing time and value throughputs for TVWFD-net, since they do not consider how to transform TVWFD-net into a simulation model and the value aspect in TVWFD-net is ignored.

In order to show the advantages of our approach, we compare it with activity based approach and completion time based approach based on the motivation case. For the execution of instances and the length of time durations, we use the same settings in “Analysis of results” section. And we make the following settings:

  1. (1)

    For activity based approach, we assume that it uses our approach to transform TVWFD-net into a simulation model, and its time throughput is the number of activities, and value throughput is 0;

  2. (2)

    For completion time based approach, we assume that it uses our approach to transform TVWFD-net into a simulation model, and the value throughput is defined as the total value created by the activities that have been completed in a basic observation time unit.

Then, the detailed procedures of two comparisons are shown as follows.

Comparison with activity based approach: We obtain the time throughputs of 30 time durations using the activity based approach in normal season and peak season shown in Fig. 16a, b, respectively. The mean and maximum time throughputs of normal season are 14 and 17, respectively. The mean and maximum time throughputs of peak season are 32 and 35, respectively. In both normal season and peak season, the value throughput is 0.

Conclusion of comparison 1: Based on the throughput in activity based approach, we only obtain the number of activities that have been completed in a time duration. It shows that the contributions of activities are similar to the throughput. However, the completion time of each activity in workflow process is different in real business environment. Namely, their actual contributions for meeting the deadlines are unequal. Therefore, we conclude that it is not inaccurate and has no sense to the management of enterprises in balancing the production capacity and capital chain. However, based on our approach, we can obtain the time and value throughputs. They can help enterprises balance production capacity at each stage as well as determine how much capital should be recycled over a period of time, respectively.

Comparison with completion time based approach: We compare the difference between completion time based approach and our approach in time and value throughputs for cases: normal season and peak season with 3 days length of time duration, respectively. The results during normal season are shown in Fig. 17. And the Fig. 18 shows the results during peak season.

As is shown in Fig. 17a, b, the mean time and value throughputs between completion time based approach and our approach during normal season are 19 and 103 respectively. As for the difference, the time throughput accounts for 18.45% and the value throughput up to 50.74%. As shown in Fig. 18a, b, in peak season the mean time and value throughputs of completion time based approach and our approach are 42 and 194 respectively. The difference in time throughput accounts for 17.72%, and the value throughput difference up to 46.08%.

Furthermore, in order to show the difference of time and value throughputs between completion time based approach and our approach in different length of time durations, we change the length of time duration to 4 days for the normal season, and obtain the results as shown in Fig. 19a, b, respectively. The difference of mean and maximum time throughputs in normal season is 20 and 27, respectively. And the difference of mean and maximum value throughputs in normal season are 108 and 137 respectively.

We also change the length of time duration to 5 days for the peak season, we obtain the results as shown in Fig. 20a, b, respectively. The differences of mean and maximum time throughputs in peak season are 42 and 49, respectively. And the differences of mean and maximum value throughputs in normal season are 201 and 225 respectively.

Conclusion of comparison 2: Based on the discussion above, we conclude that our approach is more efficient and accurate than the completion time based approach. It can reflect the actual workload of an enterprise and the gross profit in a future period of time by the time and value throughput, respectively. With the exact time throughput, on the one hand, the managers can balance the production capacity of each stage, e.g., reducing the number of staffs and machines when receiving few product orders to avoid unnecessary fund consumption. On the other hand, the managers can determine how much capital should be recycled in a future period of time based on value throughput. Then the balanced capital chain can ensure the workflow processes execute smoothly and the enterprise can keep a lean management.

Conclusion

Time and value throughputs are important in managing data-aware workflow processes in enterprises, since they reflect actual workload and gross profit of enterprises over a period of time, respectively. However, the existing methods are not reliable and accurate in calculating throughputs of workflow processes. Furthermore, they cannot model and simulate data-aware workflow processes and do not consider throughput of workflow processes from value aspect.

In this paper, we focus on modeling and simulation of time and value throughputs of data-aware workflow processes. On the one hand, both time and value elements are considered when constructing the abstract model for data-aware workflow processes. On the other hand, these two throughputs are calculated by considering not only the total execution time and value of the activities which are finished in a time duration, but also those which can’t be finished completely in a time duration. Compared with the existing works, this is the first attempt to propose the concepts of time and value throughputs of data-aware workflow processes, and the whole procedure of modeling and simulation of them. Furthermore, a prototype system-throughput analyzer is designed and developed to obtain and analyze them automatically. Based on these two throughputs, we can help enterprises balance production capacity at each stage as well as determine how much capital should be recycled over a period of time.

In the future, we would like to extend our research on the following issues: (1) How to handle the changes of TVWFD-net during run time; (2) How to extend our approach to the scenarios which contain other resource allocation strategies besides FCFS.