Keywords

1 Introduction

This paper examines the question of how to query, filter (i.e., manipulate), and transform large, complex process models to gain insight into improving model quality. The focus is on process models only; process execution and log mining are out of scope, as many of the world’s largest organizations have fragmented process management efforts, in various stages of maturity, scattered across disconnected departments. There is value in applying process querying to business process models without considering downstream process execution.

The query method presented was tested on enterprise architecture models in automotive manufacturing. The development lifecycle for new vehicles – from initial-marketing-concepts through to ready-for-mass-production – spans years. At any given time, there are dozens of concurrent vehicle programs in progress globally. The lifecycle was modeled as one process with variants by program scale and vehicle model. The process models shared 700+ different activity types (both sub-processes and tasks), instantiated as 4,000+ activity instances, interconnected via 7,000+ workflows, performed by 10,000+ process workers, assuming two dozen roles.

Modeling was performed in OpenText ProVision [3], a commercial, enterprise architecture modeling tool. ProVision integrates with Excel, providing rapid model creation and manipulation using tabular data. The process querying lifecycle involves three steps, model inquiry, manipulation, and update. Model inquiry involves searching process XML data to focus on key process areas, generally to improve model quality. Model manipulation alters the process model XML using XQuery, XSL, and Excel macros. Model update applies the changes so that ProVision renders revised process model data to highlight the query results.

Prior to establishing a BPM practice with ProVision, the product development lifecycle was planned as a Gantt-style program schedule in Microsoft Project with occasional efforts to model processes in Visio. Neither Project nor Visio adequately met process modeling needs, as model size and complexity were overwhelming. Even after the introduction of ProVision models, some stakeholders resisted enterprise modeling. To secure BPM buy-in, stakeholders needed a way to move from process complexity to insight so they could incrementally refine model quality. The solution came in the form of process queries that filter out irrelevant details to focus attention on problems and opportunities within process models. Once value was established, demand for BPM and process querying increased.

This paper is structured as follows. Section 2 states the problem being addressed, namely the challenges of understanding, analyzing, and improving large, complex manufacturing processes when the subject matter experts responsible for authoring the process definitions approach them from different perspectives. Section 3 describes our method for querying these process models (i.e., model inquiry, manipulation, and update) to generate filtered process views from multiple perspectives that focus attention on opportunities for improvement. Section 4 presents results and extensions of work into related areas. Section 5 concludes with limitations, open problems, and lessons learned.

2 The Problem: Large, Complex Process Models

The automotive product development pipeline (from marketing concept to ready-for-mass-manufacturing) takes years. In this case, the entire process spanned 50 + process maps. Each map was printed on a one by two meter poster and displayed at least 70 activities. These posters – taped across the walls of a large room – presented a degree of complexity such that even with a magnifying glass, workflows were nearly impossible to follow. These maps were unwieldy, difficult to read, with too much unfiltered, detailed process information (see Fig. 1).

Fig. 1.
figure 1

Complex process maps, before query and filtering.

2.1 Differing Stakeholder Perspectives

Stakeholders, particularly the authors responsible for process design, approached models from different viewpoints (e.g., management, engineering, purchasing). From these viewpoints, they searched for different views (e.g., value streams, program schedules, workflow simulations, functional, business information, applications & services). The combination of viewpoints and views established search perspectives.

2.2 Quality Issues

BPM models were created by aggregating program management data from different teams using Microsoft Project, Visio, and custom, in-house applications. Once collected, data was loaded into Excel and split across object tables (e.g., activities) and link tables (e.g., workflows to interconnect activities). The resulting Excel files were imported into ProVision’s inventory of modeling objects and links between objects. This approach to creating BPM models was faster than creating them by hand, but there were quality issues with the input data that led to model inconsistencies such as:

  • Graph completeness problems [4]: missing inputs or outputs.

  • Temporal problems [5]: inputs available after activity start, outputs produced too late, missing activity duration.

  • Attribute quality problems: missing author/resources, typing errors in description.

3 Querying Business Processes

Process queries helped stakeholders to navigate interconnected models and to discover model improvement opportunities. For example,

  • Given an author, find all of his/her activities within a workflow model.

  • Given an artifact, find all activity usages (i.e., instantiations of an activity).

  • Given a milestone, find the distinct list of artifacts that cross swim lane boundaries.

  • Given a milestone, find the distinct set of artifacts that are associated with workflows that cross swim lane boundaries. (e.g., bill-of-material information handed off from one team to another).

  • Given a role, highlight all activities performed by that role (e.g., marketing).

  • Given a parameter, find model objects a matching attribute (e.g., find activities).

  • Quality query: given a process model, verify that there are no workflows with one end detached (i.e., no dangling workflows).

  • Quality query: given a process model, assert (number of activities where author is NOT missing = total number of activities).

  • Compliance query: given a process model, compare it to the APQC reference model for automotive manufacturing to assess standards compliance noting drift [6].

  • Navigation queries. Use query results as input to the next query. Repeat to navigate within and across models. For example, given an author, find all of his/her activities. Then, for each activity, trace all input workflows to find only those upstream activities owned by a different author. In this way, two authors can discover their dependencies other and coordinate their teams’ planned process details.

3.1 Model Inquiry

Process data conformed to ProVision’s common interchange format (CIF.xsd), an XML schema supporting model portability across vendor platforms (Fig. 2).

Fig. 2.
figure 2

An XML fragment of a process activity

On lines 7–9 of this XML fragment, the stakeholder responsible for this activity is stored in the author custom property, an important query search key. On line 1, the activity id “157896” is referenced throughout the process model to refer back to this activity (e.g., to connect it to workflows). These reference IDs chained together to enable navigation queries and nested searches and were invoked repeatedly as users traversed workflows. Missing data on line 4, <workTime>, is an example of poor quality input data that hinders queries such as finding critical paths to reduce time-to-market. Consider the query in Fig. 3: given an author, find all of her activities.

Fig. 3.
figure 3

XQuery to return a collection of all activities owned by $author

When this query runs, the result is a set of zero or more <member> elements as shown on lines 3–5 in Fig. 4. The set <members> in <modelScenario> includes only those activities owned by $author.

Fig. 4.
figure 4

XQuery result set with references to all activities owned by $author = “John Doe”

ProVision uses the <modelScenario> element to manage process simulation scenarios, which it renders as graph layers. We discovered that this element can be overloaded to create a collection of model layers, which, when superimposed on each other, filter out irrelevant process details to focus attention on query results.

3.2 Model Manipulation

This stage of the querying life cycle alters XML in a process model. For instance:

  • Given query results, insert the results as rows into a new process model layer

  • Given task duration data, populate the work time for each activity

  • Given inter-activity timing data, populate transit time for each Workflow

This stage was challenging because most of this work was performed manually by editing boilerplate CIF.xml process data similar in format to XPDL and inserting query results. Note that the process definition files were often over 100 GB in size.

3.3 Model Update

Updating a model involved uploading a manipulated model definition into ProVision. In some cases, post-processing was applied to color workflows using JavaScript, which had the effect of highlighting workflows to draw attention to gaps, overlaps, and errors. See Fig. 5.

Fig. 5.
figure 5

JavaScript to highlight process model elements during model update

When the collection of activities owned by a given author was combined with color highlighting of the workflows by stereotype, the resulting filtered process layer overlaid the ghosted process layer and was rendered as shown in Fig. 6.

Fig. 6.
figure 6

Filtered workflow map with query results highlighted for visibility. (Color figure online)

4 Results and Extensions of Work

This process querying work helped a multi-disciplinary team realize a significant, undisclosed reduction in time-to-market for a multi-year manufacturing process, while improving overall model quality. Further, BPM gained acceptance among skeptics within the organization. Accordingly, the success of this work expanded beyond the original scope of improving BPM model quality.

4.1 Queries Applied to Other Enterprise Models

TOGAF, as it was used, specified seven types of enterprise models: strategy, organization, capability, process, information, application, and technology. This approach to querying process models was equally useful when applied to querying other enterprise models. Examples follow:

  • An executive might start with a process model, and then search for the TOGAF business capability it implemented, and in turn, navigate to the associated TOGAF value stream.

  • An enterprise architect might start with a process model, navigate to the information model it required, and then navigate to the application models that produced the information required by the process.

  • A process author, assuming the role of purchasing manager, might start by searching an activity within the purchasing swim lane, then navigate upstream to work performed within other swim lanes (such as marketing or engineering). S/he could examine the attached artifacts (e.g., inputs such as marketing features or CAD data), and then create a new sub-process activity to handle bottlenecks (e.g., if substitute parts had become necessary due to supplier issues).

4.2 Enterprise Architecture Portal

Process models were part of a broader collection of enterprise model portfolios [8], collectively containing over 800,000 model objects and covering all aspects of the company. The approached to process querying also applied to other model types within the TOGAF framework. An enterprise architecture portal was built so stakeholders could query all models types and navigate interconnected, filtered model views. This was done by selecting a model portfolio (e.g., vehicle design) and then selecting from configurable filters (e.g., filter by process map stage and organization perspective) as shown in Fig. 7.

Fig. 7.
figure 7

Portal for exploring TOGAF enterprise models which included process models.

Model metadata drove the portal’s content and included support for multiple portfolios of models as shown in Fig. 8. This approach can be useful when comparing two sets of models, each from different authoring tools.

Fig. 8.
figure 8

XML data to populate model entries in enterprise architecture portal

4.3 Querying Workflows and Artifacts to Discover Micro-services

A firm-wide effort existed to replace legacy information systems with cloud-based, micro-services. Part of this work involved identifying workflows where process participants used email to hand off information across swim lane boundaries, a practice which led to document management issues and rework. Combining query results that identified sets of candidate workflows with lists of end-of-life systems provided a short-list of migration-eligible system as shown in Fig. 9.

Fig. 9.
figure 9

Querying information flows to map legacy information systems to microservices

An example of a candidate information flow is the activity “PR-849 Purchase Parts” in the purchasing swim lane of Fig. 6. Such activities were identified by exporting process models to Excel and searching the descriptions of system, workflow, artifact, and activity objects via regular expressions to find target data (e.g., parts data, order data, CAD files, etc.). While this worked, a better approach would have been to use a process query language [7] with a search query along the lines of the following SQL-like pseudo code:

figure a

Ideally, such a PQL statement would be able to invoke regular expression searches (e.g., “[part|BOM].*data”) of artifacts outside the process model being searched in a manner similar to Transact-SQL’s xp_cmdshell() [9], but without security issues.

The end goal was an inventory of service-ready activities (SrActivities). In ProVision, when a process modeler drags an SrActivity onto the process designer canvas, the tool validates and instantiates the web services interface to the correct micro-service. A proof-of-concept was produced in ProVision using the Excel approach. SrActivites were inventoried separately from non-service-ready activities, and web service interfaces were generated. Thus, it was possible to measure progress towards migrating activity inputs from legacy information systems and external MS Office documents (passed by email) to micro-services.

4.4 Filters to Normalize Models for Vendor-Neutrality

The long-term preservation of model assets is an important part of corporate records retention. Enterprise model artifacts must be portable across evolving modeling tools. Unfortunately, model fidelity is sometimes lost when exporting/importing models between tools (e.g., using BPMN or XPDL). There is an inherent problem in relying on import/export features of BPM modeling tools because tools to manage enterprise content (ECM) and models (e.g., BPM) have different objectives than archival tools [13]. Anticipating the long-term need to preserve models across tools, a repository was created of rendered model views in both PDF and HTML formats. To create these static model views, process queries filtered out tool-specific branding (e.g., ProVision, Activiti, Sparks Enterprise Architect) and aggregated models published by all vendors. Thus, stakeholders could focus on unified views of enterprise models independently of the tools used to produce them. In this way, the modeling lifecycle and modeling-tool vendor management lifecycle could evolve independently.

5 Conclusion

This work focused exclusively on process querying as it relates to BPM modeling, not the mining of process logs, because many organizations are not yet ready for log mining. Even with this narrow focus, there was still much value in applying process querying to models.

5.1 What Worked Well; What Did Not

The most useful query was finding timing gaps (i.e., leads), overlaps (i.e., lags), and errors in workflows between activities. A gap exists when an upstream activity finishes one or more weeks before a downstream activity starts. An overlap exists when both activities execute concurrently for one or more weeks. A workflow error exists when the downstream activity starts before the upstream activity starts, or when one end of a workflow is unattached to a process element (activity, start, end, or gateway). Reducing time-to-market involved iteratively refining process models to close gaps, maximize overlaps, and eliminate errors.

Process models were planned backwards so the first activity (Design vehicle concept) had negative start and finish times, and the last activity (Confirm ready-to-manufacture) had a finish time of 0. Thus, given two activities Ai [Si, Fi] and Aj [Sj, Fj], a Gap exists when Sj < Fi an Overlap exists when Si ≥ Sj > Fi, and an Error exists when Sj > Si, where A = Activity, S = Start, F = Finish, and Aj depends on input from Ai.

ProVision version 9.2 has a defect when importing model data from Excel: it does not load the activity.workTime column into the process model’s activities, which prevents automated, critical path analysis. To circumvent this problem, workflows between activities where inventoried in Excel with one row per workflow, sorted by activity.workTime to prioritize leads, lags, and errors between adjacent process activities. With this prioritized list in hand, filtered views of process maps were created to highlight timing problems.

Regarding performance, the models exported by ProVision in its common interchange XML format were big, often over 100 MB. Loading them into OxygenXML Designer and ProVision led to non-linear processing delays (and occasional crashes), which seemed to grow exponentially with file size as the model was loaded into memory. In XML processing, streaming has better performance than loading large DOMs. [10] This is a consideration when designing process query languages and PQL processors.

The manipulation stage of process querying involved labor-intensive, batch work for developers, which proved challenging. The turn-around time to produce a filtered process layer could be as much as 20 min. Using shell scripts with regular expressions, experiments with XSL, and manual processing, layers were created. An area of future work would be to automate model manipulation so that stakeholders could execute ad-hoc PQL queries on the fly to explore and navigate models. Process modeling tools would have to support a PQL-compliant API in order to dynamically render ad-hoc queries.

5.2 Limitations, Open Problems, and Lessons Learned

Resources such as code snippets are available on github [11]. Areas for future research include:

  • Improving XML processing performance—not enough effort was spent on measuring model size vs. processing time.

  • PQL portability across modeling tools—based on this experience using a specific BPM modeling tool (ProVision and its common interchange format), it is clear that PQL portability across BPM tools will be in demand in industry.

  • Support for ad-hoc queries and model navigation—the enterprise portal was popular among stakeholders. However, as implemented, it was limited in its ability to render dynamically-generated model views on the fly in response to ad-hoc queries. Such queries support exploration and discovery of interconnected models and are valuable to stakeholders.

  • Model design drift and compliance—just as process instances drift during execution, so too does design intent drift when subject matter experts design and maintain large, complex process models over years. There was interest in archiving model changes for corporate history and in measuring the cost of model drift [12]. A big driver is tracking process compliance and alignment to industry standards, particularly the APQC standard for automotive manufacturing processes.

  • The biggest lesson learned from stakeholder feedback was that PQL techniques are applicable to all TOGAF enterprise architecture layers, not just the process layer. Thus, an open problem and area for future research is whether and how to apply PQL techniques to performing queries in the broader context of enterprise architecture models.