Keywords

1 Introduction

Business Process Management (BPM) remains a central, foundational element of running organizations today [1]. Companies use business process models for documenting various business operations such as the process for hiring a new employee, clearing a travel reimbursement claim and so on. Often process models remain buried in unstructured documents as images or screenshots. They could be drawn using standard drawing tools like PowerPoint or specialized modelling tools such as Visio. In either case, such embedded process model images may become quickly obsolete as the underlying business process evolves. This makes transitioning of tasks and maintenance of process know-how adhoc and manual.

Thus, there is a lot of value in digitizing the unstructured images using image processing technologies. The digitized flows could be used to understand deviations in actual process operations or be updated to reflect evolution of the process itself.

Standard off-the-shelf image recognition tools (such as Vision API (Google), Watson Visual Recognition (IBM)) cannot recognize the various different shapes and connectors in business process model images drawn according to the Business Process Model and Notation (BPMN) standard [2].

In this paper, we provide a technique to automatically identify images describing business processes and converting them to the standard BPMN format (as shown in the left and right parts of Fig. 1, respectively). To the best of our knowledge we are the first to provide such a system.

Our specific contributions in this paper are as follows:

  1. 1.

    Given any image, automatically identify if the image represents a business process (using a Convolutional Neural Network (CNN)).

  2. 2.

    Given a business process image, automatically identify 64 different kinds of process model shapes (using another CNN).

  3. 3.

    Using optical character recognition techniques, extract text from the different shapes of a process model image.

  4. 4.

    Identify the flow of activities in a business process image using computer vision techniques.

  5. 5.

    Generate the output BPMN XML representing the input image.

The remainder of this paper is structured as follows: Sect. 2 describes our deep learning based approach to transform a given image into a business process model. Section 3 shows our evaluation while Sect. 4 puts our work in the context of related works. We conclude in Sect. 5.

Fig. 1.
figure 1

Sample business process model image (left) and its BPMN XML (right).

2 Approach (System Architecture and Implementation)

In this section, we describe our approach. The input to our system is an image (in standard formats such as JPEG, PNG) and the output is an XML in the BPMN format (as shown in the left and right parts of Fig. 1, respectively).

A high-level architecture diagram of our approach is shown in Fig. 2. Each step in our approach is described in a separate sub-section below:

Fig. 2.
figure 2

System architecture diagram.

2.1 Identifying Business Process Model Images

We used a binary classifier to distinguish between business process model images and non-business process model images. To train the binary classifier, we need training data. Essentially, we need a large number of BPMN images as shown in the left part of Fig. 1. For the model to learn effectively, there needs to be variety in training data. Figure 1 has eight different activities spread across three swimlanes. It has five tasks (rectangles), one decision or gateway (rhombus/diamond) and two events (circles). There are other basic BPMN shapes as well which are not used in Fig. 1. Thus, we need to create variants of Fig. 1 with different number of tasks, gateways, events, swimlanes and other BPMN shapes. using the Activiti Java library. We programatically created 35,000 BPMN XML files with varying number of swim-lanes and varying number of BPMN entities such as tasks, events, gateways. We then imported the BPMN XML files into an open source modeling tool for BPMN, http://bpmn.io/ to render the process models. Finally, we automatically downloaded the resulting model images using Selenium, the browser automation tool. Thus, we gathered 35,000 business process model images.

To handle variations in rendering across modeling tools, we also used Microsoft Visio to generate another 20,000 business process model images. We used the Visual Basic for Applications (VBA) to programatically create 20,000 process models in Visio and exported them as PNG images. Using Activiti, an open source java BPM library we created another 10,000 images. Finally, we created 5000 BPMN images from another modeling tool, IBM Blueworks Live. We thus generated 70,000 business process model images overall.

We also created an equal number of non-process model images. To create this set of images, we could have downloaded random images from the web. However, such a set would not help distinguish between a process model (as in Fig. 1) and a similar looking diagram such as a flow chart or a system architecture diagram (as in Fig. 2). We thus automatically created charts and other diagrams that look superficially similar to process models.

We set aside 20% of the above set of 140,000 images (i.e., 28,000 images) for evaluation (described in Sect. 3). With the remaining 112,000 images we trained different binary classifiers [3] as shown below:

  1. 1.

    Neural Network (Multi Layer Perceptron [MLP])

  2. 2.

    Support Vector Machines

  3. 3.

    Random Forests

  4. 4.

    Decision Tree

  5. 5.

    Naive Bayes

  6. 6.

    Logistic Regression.

During training, each image was first resized to a resolution of \(224 \times 224\) pixels and passed through a popular Convolutional Neural Network (CNN) called VGG19. We extracted 4096 features from its last convolutional layer, which were then used for classification with each of the above classifiers. The evaluation of this model is described in Sects. 3.1 and 3.2.

2.2 Contour Detection

We now describe our approach to automatically extracting contours. Contours can be explained simply as a curve joining all the continuous points (along the boundary), having the same color or intensity. The contours are a useful tool for shape analysis, object detection and recognition.

The goal here is that given a business process model image input as in Fig. 1, we want to identify contours such as the contour denoting the rectangle around the shape with the text “Submit Reimbursement Claim”. A shape may have sub-contours (especially if there is an icon inside it as in Fig. 3), while the entire Fig. 1 can be considered a contour. Thus, we need to find contours at the appropriate level (i.e., the level of shapes). We use the OpenCV library [4] to find contours as described below:

We first find all the contours recognized by OpenCV in the input image. We then check if the contour is the outer-most contour (which represents the entire figure) or a child contour i.e., a contour within another contour. We ignore inner-most child contours and the outer-most contour. The remaining contours are at the level of the swim-lanes, shapes and edges, which is the appropriate level for our shape detection described next.

2.3 Identifying Different BPMN Shapes

Here we followed an approach similar to identifying whether an image represents a business process model as described in Sect. 2.1. We used the same set of six classifiers, with the difference being that these classifiers were not binary but given a contour, were trained to classify the contour as being one of the 64 different BPMN shapes, a subset of which are shown Fig. 3.

As before, we need to create training data for these shapes. We will describe the procedure for one shape, viz., the start shape denoted by a circle (the very first shape in the first row of Fig. 3): We drew this shape using the tool bpmn.io. We then downloaded the resulting BPMN XML and programmatically created size variants of the same shape by varying its properties (For example, altering the radius for a circle, changing width and height for a rectangle, varying text in the shape and so on). With each variation, we used Selenium to import the varied XML into the bpmn.io tool and then downloaded the rendered image. We thus created 3,000 images for one shape. We repeated the process for all shapes. These 192,000 images were used these to train another model whose task is to distinguish between the 64 different shapes.

Here too we scaled the images to a resolution of 224 by 224 pixels and passed them through the VGG19 Convolutional Neural Network (CNN) and used the 4096 features to classify a shape as belonging to one of 64 types. The evaluation of this model is described in Sects. 3.1 and 3.2.

Fig. 3.
figure 3

A subset of the BPMN shapes used in training

2.4 Text Extraction from Shapes

We now describe our text extraction technique. Most of the shapes have the text inside, but some shapes have the text below as shown in Fig. 1. Typically, tasks have text inside the shape, whereas shapes like start, end and certain kinds of gateways contain the text below it. We used the well-known optical character recognition tool, Tesseract [5] to extract text. As Tesseract is a well-proven tool we did not evaluate text extraction.

2.5 Sequence Flow Detection (Edge Detection)

Here we describe the flow (edge) detection. The challenging part in edge detection is how in detecting the edge direction. The arrow heads which denote the edge direction are actually small lines and may not be detected by existing line detection algorithms. Thus, we need other approaches to detecting the direction. We partition the contour (i.e., a small rectangle around the edge) into two equal halves. The half with the arrow head will have more pixels and intensity than the half without the arrow head. Thus, we can find the direction. Evaluation of edge detection is described in Sect. 3.2.

2.6 Generating Output

Algorithm 1 shows the outline of how we generate the XML (as in the right half of Fig. 1) which represents the output of our technique. We used the Activiti java library to aid in the XML generation. As the output generation is primarily an engineering task consisting of putting together identified shapes and edges, we do not evaluate it.

figure a

3 Evaluation

In this section, we describe our evaluation. Our experiments were geared towards answering the following research questions (RQ):

  • RQ1: With what accuracy can we distinguish between process model images and non-process model images?

  • RQ2: How accurately can we distinguish between the 64 BPMN shapes?

  • RQ3: What is the accuracy of our edge detection (i.e., sequence flow)?

To answer the above research questions, we used two types of data:

  1. 1.

    Automatically generated data

  2. 2.

    Data obtained from our clients.

In the following sub-sections, we describe our evaluation with the above two sets of data. Across both sets, the standard evaluation metrics of precision, recall and F1-Score are used.

3.1 Evaluation with Generated Data

As described in Sect. 2, we had accumulated a set of 140,000 images consisting of process models and non-process models. Of this, we had set aside 20% (i.e.,  28000) of the images for testing. We used these images to answer our research question, RQ1. While we performed the experiments with the different classifiers described in Sect. 2, we show the results for the MLP (Multi Layer Perceptron) classifier [3] as it had the best results. The first row of Table 1 shows the results. We achieve a very high precision and recall.

For answering RQ2 i.e., precision and recall of shape identification, we gathered 20% of the 192,000 shapes we had generated for shape detection training. These 38,400 shapes were used to find the precision and recall of shape detection. The second row of Table 1 shows the results. We again achieve a very high precision and recall.

For BPMN image identification and shape detection, we used deep learning and hence generated training data. Thus, we evaluated on a held-out set of data. However, for edge detection i.e., RQ3 we did not use a learning approach, hence we do not evaluate it here, its evaluation is described in the next sub-section.

Table 1. Evaluation on generated data (MLP classifier)

3.2 Evaluation with Client Data

We gathered twenty (20) random documents (Powerpoint or Word) from our clients, which had a mixture of business process model images and other images such as charts, app screen-shots and so on. Overall, there were 96 images with 24 of them being business process model images. In these 24 business process models, we had overall 200 shapes and 210 edges.

Note that unlike the generated data set, here we do not have a readily available oracle or gold-set i.e., here we have to manually inspect the documents and the images to find the total number of business process model images, shapes and edges. Further, we have to manually compare the output from our technique with the actual image to compute precision and recall for images, shapes and edges. As this is a time consuming task, we have lesser data points in this evaluation as compared with the evaluation on generated data.

Table 2 shows the results. Similar to the evaluation on generated data, here too the Multi-Layer Perceptron (MLP) performed the best and its results are depicted. As can be seen from the table, our technique does achieve good precision and recall in BPMN image, shape and edge identification on client data.

Table 2. Evaluation on client data (MLP classifier)

3.3 Threats to Validity

Our approach is able to identify 64 different types of BPMN shapes, many of which are similar to each other. The overall number of allowed shapes in the official BPMN standard [2] is about 360. Thus, it is possible that our approach may not work well with all variants. However, we believe that with additional training data depicting the shapes not covered so far, our approach can identify all the different BPMN shapes.

4 Related Work

To the best of our knowledge, we are the first to automatically identify images that depict process models and convert a business process model image into the standard BPMN XML format.

Sethi et al. [6] propose an approach to extract and understand deep learning design flow diagrams available in a research paper and convert it into execution ready source code. In their approach, there is no need to detect different kinds of shapes (as the diagram only contains rectangles). The vocabulary is also limited, making text recognition easier and the sequence flow is linear (top-down). Thus, their approach cannot be used as is for our problem.

Other broad category of work deals with generating synopsis from images such as tables and figures in papers [7], extracting information from line graphs [8], charts [9]. None of the these work can be used partly or entirely in our approach.

5 Conclusion

We described a novel approach using deep learning to automatically identify process model images. We then proposed an automated approach to transform a process model image into the standard BPMN XML format. This conversion involved identifying different BPMN shapes using convolutional neural networks, detecting edges between the shapes via image processing, extracting text from the shapes utilizing optical character recognition and finally generating the BPMN XML. We demonstrated empirically that our approach works well with good precision and recall.