Keywords

1 Introduction

Process modeling and analysis has multiple application domains. I.e. clinical pathways are an evidence-based response to specific problems and care needs in clinics. They support physicians by providing recommendations on the sequence and timing of actions necessary to achieve an efficient treatment of patients [1, 2]. Each clinic has their own pathways based on their individual evidence and experience. Therefore, there are multiple pathways that target different problems and care needs [35].

However, physicians are not strictly restricted to the published pathways. Therefore, the process, defined in the pathway (target process model), can differ from the actually performed workflow (current process). As a result, there might be discrepancies between the published clinical pathways and the actually performed workflow, which is based on the decisions of the physician on how to treat the patient.

This situation is aggravated by the fact that there is a lot of data, generated and used during the treatment of patients, that needs to be managed and interpreted. In order to ease this task, the information can be captured semantically and used for comparisons. For this purpose, there are already many ontologies in the medical domain, which can be used to structure the semantic information – the Disease OntologyFootnote 1 that provides descriptions and related medical terms about human diseases and the Foundational Model of Anatomy (FMA)Footnote 2, which describes classes, structures and relationships of all parts of the human body. In addition, processes can be compared based on different service qualities and dimensions, such as complexity, runtime, outcome or costs.

The problem of having current processes that diverge from the defined target process models does not occur only in the medical domain. The same difficulties arise also in enterprises and in the domain of Internet of Things (IoT) applications, in which the actual communication flow between devices can diverge from a defined target process model. This is precisely the topics that we want to explore. One aspect is that it is debatable whether the current process performs better, in terms of certain service qualities and dimensions, than the defined target process model. Knowing the deviation and different outcome of the service qualities and dimensions could lead to the incentive of adapting the target process model.

Given a set of current processes and its target process model, we are interested in calculating the similarity between them, in order to be able to quantify the variety and see how different processes behave in terms of different service quality aspects and dimensions. The revelation of the effect of service qualities and dimensions can be used for different aspects. One aspect is to provide a confidence interval for a service quality variable. Another aspect could be the hint of adapting the target process model, if the current process instances diverge too much from the target process model. The adaption of the target process model can than be performed in respect to certain service qualities and dimensions.

2 State of the Art

An important aspect, in order to have a common point of view on processes, is to define the term process. We use the process definition from ISO 9000:2015 [6], which is given in the following.

Definition 1

ISO 9000:2015 Process: Set of interrelated or interacting activities that use inputs to deliver an intended result.

Note 1 to entry: Whether the “intended result” of a process is called output, product or service depends on the context of the reference.

Note 2 to entry: Inputs to a process are generally the outputs of other processes and outputs of a process are generally the inputs to other processes.

This definition is specifically related to quality management systems, but we aim to use it in a broader way. We do not focus on quality management systems in particular but rather on processes in general.

Process semantic annotations: There are already widely used ontologies, such as Dublin Core SchemaFootnote 3 that provide a set of metadata that can be used to annotate resources. We can use these ontologies to annotate process elements like e.g. tasks and gateways of the target process model. The advantage of such schemata is that they can be integrated easily in order to annotate resources and provide interoperability with further datasets.

Semantic process-based formalizations and conformance checking: Business Process Abstract Language (BPAL) provides a formal semantic to process modeling languages [7, 8] and allows enriching it with semantic annotations. The formal definition allows a verification of the used properties and the ontology-based annotations. Thus, this approach can be used to semi-automatically map current process instances to its target process model and verify the mapping according to a correct semantic annotation [912]. There are also some ontology-based annotations for process models available that can be reused [13, 14]. In addition to semantic annotations, there are also ontologies available to describe the components of a process and the relationships between them such as SUPER [15, 16] and the Process Specification LanguageFootnote 4, which has been approved as an international standard [17].

Service qualities and dimensions: Existing approaches describe how service qualities and dimensions can be captured [18]. Thereby, frameworks like SERVQUAL can be used to measure the quality of processes [19]. Service qualities from e-services [20] or other process performance indicators [21] can also be used as metrics to measure the performance of processes.

Process matching: There are a number of different process similarity measurements for comparing processes. Some uses node similarity, structural similarity, behavioral similarity and language based matching [22, 23]. However, most of them focus on business processes [24, 25] and do not distinguish between target and current process models. The similarity of processes is, among others, used to cluster processes [26].

Adaption of target process models: Approaches like e.g. Process Mining try to reveal hidden structures and create a target process by using i.e. log files or other data produced by process instances [27, 28]. These approaches reveal hidden structures but not the influence of processes on different service qualities and dimensions. However, process mining techniques can be used to discover new insights based on a created reference model from the current process data [29].

3 Problem Statement and Contributions

We focus on performing similarity analysis between target and current processes by exploiting the semantics of processes. The semantics that we use to compare process models consist, among others, of the semantic annotations (like labels and descriptions) that we add to the process models, a domain hierarchy of the process elements, the user roles (for example, only specific users are allowed to perform a task or a decision) and rules that define the workflow of processes. Based on the presented motivation and the current state-of-the-art, we formulate the following research question and its subquestions:

How do we benefit from the combination of process models with semantics in order to improve processes by performing similarity analysis?

  • RQ1 How can we formally specify process data with semantics?

  • RQ2 Which service qualities and dimensions can we use to compare processes?

  • RQ3 Which methods can we use to perform similarity analysis of target processes and current process data?

During the PhD we will develop an approach to annotate process data with semantic information and perform similarity analysis of target process models and current processes. This approach will be modeled in a common way, so it is generally applicable. In the following, we discuss the subquestions in more detail.

(RQ1) How can we formally specify process data with semantics?

There are already established formal representations for modeling languages e.g. for BPMN 2.0, the standard language BPMN 2.0 XML published by OMGFootnote 5 or the Petri Net Markup Language [30] for representing petri nets. However, while the execution semantics of processes is partly covered, there is a lack of semantics for the inputs/outputs used in the processes, annotations and the terminology of process elements. Therefore, we will show how to combine formally specified process models with semantics that can be queried and processed. The enriched current process instances can be used for comparisons and analysis.

(RQ2) Which service qualities and dimensions can we use to compare processes?

Processes can be compared based on different service qualities and dimensions such as runtime, outcome, costs or reliability. Capturing these service qualities and dimensions is a first step towards being able to compare the defined target process model and the current processes. We will analyze existing frameworks (see Sect. 2) according to their extent and their usability in different domains and maybe extend them. As possible output, we may propose a new framework.

(RQ3) Which methods can we use to perform similarity analysis of target processes and current process data?

We will show which methods can be used to compare a target process model with a set of current processes. During the use of different similarity methods, we will exploit semantics such as the hierarchical arrangements of process elements, as well as domain semantics, and user roles, linked to tasks, and rules, which influence the process flow. Figure 1 shows the comparisons of a target process model with current process instances.

Fig. 1.
figure 1

Determining the similarity between target process model and current process instances

The research questions aim to result in multiple contributions. The first contribution is the introduction of an approach that integrates processes with semantic information that can be queried and processed. We would like to integrate as much semantic information as possible to allow, in a later step, enhanced similarity analysis that considers all these aspects. Another contribution is a set of service qualities and dimensions that can be used to compare processes. We will show different metrics and how they can be used in multiple domains. The last contribution is the similarity analysis between target process models and current processes. Thereby, we will use methods that exploit the semantics, captured in the previous step, such as the hierarchy of activities and process flows, to quantify the similarity.

4 Research Methodology and Approach

The structure of the research methodology and approach is directly derived based on the research questions (Sect. 3). Research methodologies can be classified as quantitative, qualitative and mixed research methodologies. Quantitative research methods collect numerical data and use it to analyze and explain a circumstance [31]. We will apply quantitative methodologies to plan and approach the research problems. In particular we will collect a sample of process instances, formalize the data, map them to the target process model, calculate the similarity between them and reveal the effect of current processes to service qualities and dimensions. We will investigate how semantics affects the analysis and comparisons of processes, and test different methods to compare processes.

Fig. 2.
figure 2

Research approach – divided into three phases. Each tackles another aspect and influences tasks in other phases.

Figure 2 shows the planned thesis approach, divided into three main phases. Each phase tackles a specific part of the thesis, consists of performed activities and influences or is influenced by other activities. In the following, we will explain each phase in more detail.

Phase I: We assume that semantics provide a huge potential for similarity analysis and in revealing the effects on service qualities and process performance indicators. In order to exploit semantics in processes, we first have to annotate current process instances by mapping them to the semantically enriched target model.

The definition of dimensions partially overlaps with the formal specification of processes with semantics. Both activities are performed in phase I.

Phase II: This phase focuses on performing similarity analysis of target process models and current processes. We will use different similarity methods e.g. node similarity, structural similarity and behavioral similarity. Among others, we will also use methods that do not exploit semantics and compare them to methods that exploit semantics in order to show the advantages of having semantic annotations. We will also consider combining different methods for similarity analysis, resulting in a hybrid approach.

Phase III: The last phase uses the similarity analysis as an input in order to reveal the effect on service qualities and dimensions. We will evaluate whether current processes have an influence on the service qualities and dimensions. In addition, during this phase, we can also discover new insights that motivate to capture additional service qualities and dimensions. Therefore, this activity influences in turn phase II.

We will show that the methods are not constrained to a single domain by applying them to different domains (Sect. 6).

5 Preliminary Results

Currently, we are facing the first phase (see Sect. 4), which is about formally specifying process models with semantics. To this end, we analyzed different tools that allow to model processes. However, existing tools do not allow to enrich process data with semantics. In addition, we aim to follow the Linked Data PrinciplesFootnote 6 for publishing data.

In order to combine processes with semantic information, we created a tool that captures processes and allow users to enrich them with semantic information. We used bpmn-io as web modeler and extended it with further functionalities. bpmn-ioFootnote 7 is a JavaScript renderer that allows modeling and checking the syntax of BPMN processes. We embedded our developed tool into a Semantic MediaWikiFootnote 8. Thus, Semantic MediaWiki, in combination with our developed tool, serves as platform to capture, annotate, query and process the information in a structured way and publishing it as Linked Data.

With this tool, we can integrate processes, stored in the standard format BPMN 2.0 XMLFootnote 9 into Semantic MediaWiki and enrich them with semantics. The integrated and semantically enriched processes can in turn be exported into BPMN 2.0 XML format, allowing for exchange and reuse of the modeled processes.

As the next step, we will determine service quality measures and dimensions for comparing processes but also measuring the efficiency of a target process such as runtime, outcome or costs and study approaches to map current process instances to the target process model. In addition, we will consider different similarity methods to quantify the similarity between target process and current processes and necessary information that will improve the calculation of similarity. These considerations will influence the enrichment of semantic information, since we have to capture it during the annotation of the processes.

6 Evaluation Plan

For validating our solution, we will implement the designed approach and methods in different use-case scenarios. This ensures on the one hand that our approach and methods abstract from the used domain and on the other hand to capture independent results that can be evaluated.

We plan to use the following two domains to evaluate our approach:

(1) Medical Domain: Current processes in clinics differ from target process models. This is caused by latest insights and developments in the medical domain and the slow adoption of clinical pathways. In addition, there are many ontologies i.e. Foundational Model of Anatomy ontology (FMA)Footnote 10 or Gene OntologyFootnote 11 that can be used to structure processes with semantic information. Therefore, we will use our approach to calculate the similarity between target and current processes and show the influences of processes on different service qualities and dimensions.

(2) Internet of Things: Another field of application is the domain Internet of Things. In this domain, the communication and data flow between devices is not strictly given. Hence, there are more ad-hoc processes, which makes it hard to get an overview of the processes in general. Although this domain is rather new, there are already some ontologies available [32, 33]. We will use data from devices (i.e. communication data and process data) and annotate the tasks with semantic information. This allows for enhanced analysis of communication workflows, and allows us to see the deviation of current processes from target process models.

For evaluating the first research question, we will validate the formalized process data, enriched with semantics, by comparing the usability of the provided methods with different approaches and the expressiveness of the formally specified processes. The formalization of data should not be focused on a specific scenario or domain, which is shown by applying our approach and methods in multiple scenarios and domains. Mapping the current process instances to its target process model is validated by comparing it to different methods.

To evaluate the second and third research question, we will start with comparing very simple target and current process models and gradually extend the process with further details and expressiveness. Hence, we will start performing similarity analysis and revealing the effect on different service qualities and dimensions in each applied domain with a sequential process and then successively extend the expressiveness of the process and the used service qualities and dimensions.

7 Conclusions

We aim to develop an approach to annotate and perform similarity analysis between target and current processes.

We will consider the similarity in relation to service qualities and dimensions in order to (1) provide confidence intervals for service qualities and dimensions so one can estimate which values will be assigned by a process variable of the target process model (2) reveal weak spots, which has influence on different service qualities and dimensions and (3) motivate to adapt the target process model if its current process instances diverge too much from it.

In addition, the knowledge from this approach can also be used to support people with process optimization and improvements of the target process models.