Skip to main content
Log in

A systematic approach for performance assessment using process mining

An industrial experience report

  • Experience Report
  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Software performance engineering is a mature field that offers methods to assess system performance. Process mining is a promising research field applied to gain insight on system processes. The interplay of these two fields opens promising applications in the industry. In this work, we report our experience applying a methodology, based on process mining techniques, for the performance assessment of a commercial data-intensive software application. The methodology has successfully assessed the scalability of future versions of this system. Moreover, it has identified bottlenecks components and replication needs for fulfilling business rules. The system, an integrated port operations management system, has been developed by Prodevelop, a medium-sized software enterprise with high expertise in geospatial technologies. The performance assessment has been carried out by a team composed by practitioners and researchers. Finally, the paper offers a deep discussion on the lessons learned during the experience, that will be useful for practitioners to adopt the methodology and for researcher to find new routes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Notes

  1. Unified Modeling Language

  2. Modeling and Analysis of Real-time and Embedded Systems

  3. XES is the XML-based IEEE 1849-2016 standard format for event logs.

  4. In the observation period from June, 18th to August, 11th in 2017, the dates of the logs are not contiguous since logs were not available for few days.

  5. Observe that the normal distribution is considered instead of the Student’s one since the sample size is large (i.e., N ≫ 30).

  6. The Student t-distribution with N-1 = 15 degrees of freedom has been used.

  7. The percentage corresponds to the workload that is completely processed by the parsing process and it has been computed as the ratio between the filtered and unfiltered logs.

  8. The minimum (maximum) values of the interval correspond to the 20% of the minimum (maximum) throughput of the parsing process, estimated with the analysis of the parsing scenario.

References

  • Adriansyah A, Buijs J (2012) Mining process performance from event logs: the BPI Challenge 2012 case study. BPM reports, no. 1215. http://repository.tue.nl/91892e24-93b6-4b38-bae5-e4394e171bf7

  • AISencoding (2012) Automatic Identification System - Encoding Guide. http://www.uscg.mil/hq/cg5/TVNCOE/Documents/links/AIS.EncodingGuide.pdf, accessed 04/29/2016

  • AISreceiver (2016) Installation and Quick Reference Guide - SRL-200/G AIS Receiver. http://www.comarsystems.com/brochures/Installationaccessed04/29/2016

  • AIVM/AIVDO (2015) AIVDM/AIVDO protocol decoding. http://catb.org/gpsd/AIVDM.html, accessed 04/29/2016

  • Ajmone-Marsan M, Balbo G, Conte G, Donatelli S, Franceschinis G (1994) Modelling with Generalized Stochastic Petri Nets, 1st edn. Wiley, New York

    MATH  Google Scholar 

  • Averill ML (2015) Simulation Modeling and Analysis. McGraw-Hill, New York

    Google Scholar 

  • Balsamo S, Di Marco A, Inverardi P, Simeoni M (2004) Model-based performance prediction in software development: A survey. IEEE Trans Softw Eng 30(5):295–310

    Article  Google Scholar 

  • Bass L, Weber I, Zhu L (2015) DevOps: A software architect’s perspective. Addison-Wesley Professional, Boston

    Google Scholar 

  • Becker S, Koziolek H, Reussner R (2009) The Palladio Component Model for Model-driven Performance Prediction. J Syst Softw 82(1):3–22

    Article  Google Scholar 

  • Bernardi S, Campos J, Merseguer J (2011) Timing-Failure Risk Assessment of UML Design Using Time Petri Net Bound Techniques. IEEE Trans Ind Inf 7(1):90–104

    Article  Google Scholar 

  • Bernardi S, Merseguer J, Petriu DC (2012) Dependability modeling and analysis of software systems specified with UML. ACM Comput Surv 45(1):1–48

    Article  Google Scholar 

  • Brünink M, Rosenblum DS (2016) Mining performance specifications. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM, pp 39–49

  • Brunnert A, Krcmar H (2017) Continuous performance evaluation and capacity planning using resource profiles for enterprise applications. J Syst Softw 123:239–262

    Article  Google Scholar 

  • Bugzilla (2017) Papyrus Bug List https://bugs.eclipse.org/bugs/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=VERIFIED&classification=Modeling&component=Diagram&f1=short_desc&f2=short_desc&o1=substring&o2=substring&product=Papyrus&query_format=advanced&v1=Sequence&v2=Diagram– last accessed Jan. 2017

  • Casale G, Ardagna D, Artac M, Barbier F, Nitto ED, Henry A, Iuhasz G, Joubert C, Merseguer J, Munteanu V I, Perez JF, Petcu D, Rossi M, Sheridan C, Spais I, Vladuic D (2015) DICE: Quality-Driven Development of Data-Intensive Cloud Applications. In: 7th IEEE/ACM International Workshop on Modeling in Software Engineering, MiSE 2015, Florence, Italy, May, vol 16-17, pp 78–83

  • Celonis (2011) Celonis PI. https://www.celonis.com

  • Ceravolo P, Damiani E, Torabi M, Barbon S (2017) Toward a new generation of log pre-processing methods for process mining. In: International Conference on Business Process Management, Springer, pp 55–70

    Google Scholar 

  • Cortellessa V, Marco AD, Inverardi P (2011) Model-based software performance analysis, 1st edn. Springer Publishing Company Incorporated, Berlin

    Book  Google Scholar 

  • DICE-D3.1 (2016) Deliverable 3.1: Transformations to Analysis Models. http://wp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2016/08/D3.1_Transformations-to-analysis-models.pdf

  • DICE-D1.6 (2017) Deliverable 1.6: DICE Framework. http://wp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2017/08/D1.6_DICE-framework-Final-version.pdf

  • Dipartimento di informatica Università di Torino (2016) GRaphical Editor and Analyzer for Timed and Stochastic Petri Nets. http://www.di.unito.it/greatspn/index.html - accessed 01/05/2016

  • Diwan A, Hauswirth M, Mytkowicz T, Sweeney PF (2011) TraceAnalyzer: A system for processing performance traces. Soft Practice and Experience 41(3):267–282

    Article  Google Scholar 

  • Ferme V, Pautasso C (2017) Towards holistic continuous software performance assessment. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion, ACM, New York, NY, USA, ICPE ’17 Companion, pp 159–164

  • Foundation TE (2017) Eclipse - The Eclipse Foundation open source community website. https://eclipse.org/, last accessed Jan. 2017

  • Gómez-Martínez E, Gonzalez-Cabero R, Merseguer J (2014) Performance assessment of an architecture with adaptative interfaces for people with special needs. Empir Softw Eng 19(6):1967–2018. https://doi.org/10.1007/s10664-013-9297-1

    Article  Google Scholar 

  • Gómez A, Joubert C, Merseguer J (2016) A tool for assessing performance requirements of data-intensive applications. In: Proceedings of the XXIV National Conference of Concurrency and Distributed Systems (JCDS 2016), pp 159–169. https://github.com/dice-project/DICE-Simulation, accessed 01/23/2017

  • Günther CW, Rozinat A (2012) Disco: Discover Your Processes. BPM (Demos) 940:40–44

    Google Scholar 

  • Hernȧndez S, van Zelst SJ, Ezpeleta J, van der Aalst W (2015) Handling big(ger) logs: Connecting ProM 6 to apache hadoop. In: Proceedings of the BPM Demo Session 2015, CEUR-WS.org, vol 1418 , pp 80–84

  • Huber N, Walter J, Bähr M, Kounev S (2015) Model-based autonomic and performance-aware system adaptation in heterogeneous resource environments: A case study. In: International Conference on Cloud and Autonomic Computing, pp 181–191

  • Huber N, Brosig F, Spinner S, Kounev S, Bähr M (2017) Model-based self-aware performance and resource management using the descartes modeling language. IEEE Trans Softw Eng 43(5):432–452

    Article  Google Scholar 

  • ISO (2008) Systems and software engineering – High-level Petri nets – Part 2: Transfer format. ISO/IEC 15909-2:2011, International Organization for Standardization, Geneva, Switzerland

  • Joishi J, Sureka A (2015) Vishleshan: performance comparison and programming process mining algorithms in graph-oriented and relational database query languages. In: Proceedings of the 19th International Database Engineering & Applications Symposium. ACM, pp 192–197

  • Joubert C, Montesinos M, Sanz J (2014) A Comprehensive Port Operations Management System. ERCIM News 2014(97). http://ercim-news.ercim.eu/en97/special/a-comprehensive-port-operations-management-system http://ercim-news.ercim.eu/en97/special/a-comprehensive-port-operations-management-system

  • Kounev S (2006) Performance modeling and evaluation of distributed component-based systems using queueing petri nets. IEEE Trans Softw Eng 32 (7):486–502

    Article  Google Scholar 

  • Kounev S, Huber N, Brosig F, Zhu X (2016) A model-based approach to designing self-aware it systems and infrastructures. IEEE Comput 49(7):53–61. https://doi.org/10.1109/MC.2016.198

    Article  Google Scholar 

  • Lazowska ED, Zahorjan J, Graham GS, Sevcik KC (1984) Quantitative system performance: Computer system analysis using queueing network models. Prentice-Hall, Inc, Upper Saddle River

    Google Scholar 

  • Lȯpez-Grao J, Merseguer J, Campos J (2004) From UML activity diagrams to Stochastic Petri nets: application to software performance engineering. In: Proceedings of the Fourth International Workshop on Software and Performance, WOSP 2004, Redwood Shores, California, USA, January 14-16, 2004, pp 25–36

  • Maggi F M, Mooij A J, Van der Aalst W (2013) Analyzing vessel behavior using process mining. In: van de Laar P, Tretmans J, Borth M (eds) Situation Awareness with Systems of Systems. Springer, pp 133–148. https://doi.org/10.1007/978-1-4614-6230-9

    Google Scholar 

  • Menasce DA, Almeida VA, Dowdy LW, Dowdy L (2004) Performance by design: computer capacity planning by example. Prentice Hall Professional, Upper Saddle River

    Google Scholar 

  • NIST/SEMATECH (2013) e-Handbook of Statistical Methods. http://www.itl.nist.gov/div898/handbook/. Accessed 10/15/2016

  • OMG (2011) UML Profile for MARTE: Modeling and Analysis of Real-time Embedded Systems, Version 1.1. Tech. rep., Object Management Group. http://www.omg.org/spec/MARTE/1.1/

  • PNMLframework (2017) PNML Framework. https://github.com/lip6/pnmlframework

  • Prodevelop (1993) Prodevelop- Integrating Tech. https://www.prodevelop.es/en

  • ProM Tools (2017) ProM Tools. http://www.promtools.org/doku.php

  • QPRprocessAnalyzer (2011) QPR process analyzer. https://www.qpr.com

  • RapidMiner (2018) Rapid miner. https://rapidminer.com/

  • RapidProM (2018) RapidProM: Bringing process mining to analytic workflows. http://www.rapidprom.org/

  • Rubin VA, Mitsyuk AA, Lomazova IA, van der Aalst WM (2014) Process mining can be applied to software too!. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ACM

  • Runeson P, Höst M (2008) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131. https://doi.org/10.1007/s10664-008-9102-8

    Article  Google Scholar 

  • Shavor S, D’Anjou J, Fairbrother S, Kehn D, Kellerman J, McCarthy P (2003) The java developer’s guide to eclipse. Addison-Wesley Longman Publishing Co., Inc., Boston

    Google Scholar 

  • Smith CU (1990) Performance Engineering of Software Systems, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston

    Google Scholar 

  • Smith CU (2002) Williams LG. Addison Wesley Longman Publishing Co., Inc., Redwood City

    Google Scholar 

  • Tarvo A, Reiss SP (2018) Automatic performance prediction of multithreaded programs: a simulation approach. Autom Softw Eng 25(1):101–155

    Article  Google Scholar 

  • The Eclipse Foundation (2016) Papyrus. https://eclipse.org/papyrus/

  • UML2 (2015) Unified Modeling Language: Infrastructure. Version 2.5, OMG document: formal/2015- 03-01

  • Van Dongen BF et al (2005) The ProM framework: A new era in process mining tool support. In: Applications and Theory of Petri Nets 2005, Springer, pp 444–454

  • Walter J, van Hoorn A, Koziolek H, Okanovic D, Kounev S (2016) Asking “What”?, Automating the “How”?: The Vision of Declarative Performance Engineering. In: Proceedings of the 7th ACM/SPEC International Conference on Performance Engineering, ICPE 2016, Delft, The Netherlands, March 12-16, 2016, pp 91–94

  • Walter J, Stier C, Koziolek H, Kounev S (2017) An expandable extraction framework for architectural performance models. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion, ACM, New York, NY, USA, ICPE ’17 Companion, pp 165–170

  • XES (2016) Extensible Event Stream. IEEE Task FOrce on Process Mining, [Online; accessed 18-April-2016]

  • Yu X, Han S, Zhang D, Xie T (2014) Comprehending performance from real-world execution traces: A device-driver case. ACM SIGPLAN Notices 49(4):193–206

    Google Scholar 

  • Van der Aalst W (2011) Process mining - discovery, conformance and enhancement of business processes. Springer, Berlin. https://doi.org/10.1007/978-3-642-19345-3

    MATH  Google Scholar 

  • Van der Aalst W (2014) Business intelligence: Third european summer school, eBISS 2013, Dagstuhl Castle, Germany, July 7-12, 2013, Tutorial Lectures. In: Springer International Publishing, chap Process Mining in the Large: A Tutorial, pp 33–76

  • Van der Aalst W, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2(2):182–192

    Google Scholar 

Download references

Acknowledgements

Special thanks to Ismael Torres and Sergi Soler, from Prodevelop, for their help in collecting the logs of the case study. This work has been supported by the European Commission under the H2020 Research and Innovation program [DICE, Grant Agreement No. 644869], the Spanish Ministry of Economy and Competitiveness [ref. CyCriSec-TIN2014-58457-R], and the Aragonese Government [ref. T94, DIStributed COmputation (DISCO)].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simona Bernardi.

Additional information

Communicated by: Vittorio Cortellessa

Appendix: Generalized Stochastic Petri Nets

Appendix: Generalized Stochastic Petri Nets

In this appendix, we introduce the modeling formalism of Generalized Stochastic Petri Nets (GSPN).

A Generalized Stochastic Petri net (GSPN) (Ajmone-Marsan et al. 1994) is a bipartite graph, formally defined as a 8-tuple \(\mathcal {N} = (P,T,I,O,H,{\Phi },W,M_{0})\) where:

  • P is the set of places,

  • T = TiTt is the set of transitions, divided into immediate (Ti) and timed (Tt) transitions,

  • I, O, H : P × T are, respectively, the input, output, and inhibitor arc multiplicity functions,

  • Φ : T assigns a priority to each transitions: timed transitions have zero priority, while immediate transitions have priority greater than zero,

  • W : T assigns to each immediate transition a weight, and to each timed transition a firing time delay. The firing time delay is the mean value of the negative exponential distribution,

  • M0 : P assigns the initial number of tokens to each place.

Figures 6 and 7 show the graphical representation of two GSPN models, where places are depicted as circles, immediate transition as thin black bars and timed ones as thick white bars.

Transitions of a GSPN model represent actions or events in the modeled system, whereas places represent pre- and post-conditions for the actions/events occurrence. The dynamic of a GSPN model is governed by the concession, enabling and firing rules of transitions in a marking M : P, which is reachable from the initial marking M0 due to the firing of a sequence of transitions.

A transition has concession in a marking M when its input places contain at least as many tokens as the corresponding arc multiplicities, and its inhibitor places contain less tokens than the corresponding arc multiplicities.

A transition is enabled in a marking M iff it has concession in M, and its priority is greater or equal to the one of the transitions t having also concession in M.

Consequently, only transitions of the same priority level can be enabled in a marking. A transition t, enabled in marking M, may fire then leading to a new marking M, according to the equation:

$$M^{\prime}(p) = M(p) + O(p,t) - I(p,t), p\in P. $$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bernardi, S., Domínguez, J.L., Gómez, A. et al. A systematic approach for performance assessment using process mining. Empir Software Eng 23, 3394–3441 (2018). https://doi.org/10.1007/s10664-018-9606-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-018-9606-9

Keywords

Navigation