Abstract
Software performance engineering is a mature field that offers methods to assess system performance. Process mining is a promising research field applied to gain insight on system processes. The interplay of these two fields opens promising applications in the industry. In this work, we report our experience applying a methodology, based on process mining techniques, for the performance assessment of a commercial data-intensive software application. The methodology has successfully assessed the scalability of future versions of this system. Moreover, it has identified bottlenecks components and replication needs for fulfilling business rules. The system, an integrated port operations management system, has been developed by Prodevelop, a medium-sized software enterprise with high expertise in geospatial technologies. The performance assessment has been carried out by a team composed by practitioners and researchers. Finally, the paper offers a deep discussion on the lessons learned during the experience, that will be useful for practitioners to adopt the methodology and for researcher to find new routes.
Notes
Unified Modeling Language
Modeling and Analysis of Real-time and Embedded Systems
XES is the XML-based IEEE 1849-2016 standard format for event logs.
In the observation period from June, 18th to August, 11th in 2017, the dates of the logs are not contiguous since logs were not available for few days.
Observe that the normal distribution is considered instead of the Student’s one since the sample size is large (i.e., N ≫ 30).
The Student t-distribution with N-1 = 15 degrees of freedom has been used.
The percentage corresponds to the workload that is completely processed by the parsing process and it has been computed as the ratio between the filtered and unfiltered logs.
The minimum (maximum) values of the interval correspond to the 20% of the minimum (maximum) throughput of the parsing process, estimated with the analysis of the parsing scenario.
References
Adriansyah A, Buijs J (2012) Mining process performance from event logs: the BPI Challenge 2012 case study. BPM reports, no. 1215. http://repository.tue.nl/91892e24-93b6-4b38-bae5-e4394e171bf7
AISencoding (2012) Automatic Identification System - Encoding Guide. http://www.uscg.mil/hq/cg5/TVNCOE/Documents/links/AIS.EncodingGuide.pdf, accessed 04/29/2016
AISreceiver (2016) Installation and Quick Reference Guide - SRL-200/G AIS Receiver. http://www.comarsystems.com/brochures/Installationaccessed04/29/2016
AIVM/AIVDO (2015) AIVDM/AIVDO protocol decoding. http://catb.org/gpsd/AIVDM.html, accessed 04/29/2016
Ajmone-Marsan M, Balbo G, Conte G, Donatelli S, Franceschinis G (1994) Modelling with Generalized Stochastic Petri Nets, 1st edn. Wiley, New York
Averill ML (2015) Simulation Modeling and Analysis. McGraw-Hill, New York
Balsamo S, Di Marco A, Inverardi P, Simeoni M (2004) Model-based performance prediction in software development: A survey. IEEE Trans Softw Eng 30(5):295–310
Bass L, Weber I, Zhu L (2015) DevOps: A software architect’s perspective. Addison-Wesley Professional, Boston
Becker S, Koziolek H, Reussner R (2009) The Palladio Component Model for Model-driven Performance Prediction. J Syst Softw 82(1):3–22
Bernardi S, Campos J, Merseguer J (2011) Timing-Failure Risk Assessment of UML Design Using Time Petri Net Bound Techniques. IEEE Trans Ind Inf 7(1):90–104
Bernardi S, Merseguer J, Petriu DC (2012) Dependability modeling and analysis of software systems specified with UML. ACM Comput Surv 45(1):1–48
Brünink M, Rosenblum DS (2016) Mining performance specifications. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM, pp 39–49
Brunnert A, Krcmar H (2017) Continuous performance evaluation and capacity planning using resource profiles for enterprise applications. J Syst Softw 123:239–262
Bugzilla (2017) Papyrus Bug List https://bugs.eclipse.org/bugs/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=VERIFIED&classification=Modeling&component=Diagram&f1=short_desc&f2=short_desc&o1=substring&o2=substring&product=Papyrus&query_format=advanced&v1=Sequence&v2=Diagram– last accessed Jan. 2017
Casale G, Ardagna D, Artac M, Barbier F, Nitto ED, Henry A, Iuhasz G, Joubert C, Merseguer J, Munteanu V I, Perez JF, Petcu D, Rossi M, Sheridan C, Spais I, Vladuic D (2015) DICE: Quality-Driven Development of Data-Intensive Cloud Applications. In: 7th IEEE/ACM International Workshop on Modeling in Software Engineering, MiSE 2015, Florence, Italy, May, vol 16-17, pp 78–83
Celonis (2011) Celonis PI. https://www.celonis.com
Ceravolo P, Damiani E, Torabi M, Barbon S (2017) Toward a new generation of log pre-processing methods for process mining. In: International Conference on Business Process Management, Springer, pp 55–70
Cortellessa V, Marco AD, Inverardi P (2011) Model-based software performance analysis, 1st edn. Springer Publishing Company Incorporated, Berlin
DICE-D3.1 (2016) Deliverable 3.1: Transformations to Analysis Models. http://wp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2016/08/D3.1_Transformations-to-analysis-models.pdf
DICE-D1.6 (2017) Deliverable 1.6: DICE Framework. http://wp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2017/08/D1.6_DICE-framework-Final-version.pdf
Dipartimento di informatica Università di Torino (2016) GRaphical Editor and Analyzer for Timed and Stochastic Petri Nets. http://www.di.unito.it/greatspn/index.html - accessed 01/05/2016
Diwan A, Hauswirth M, Mytkowicz T, Sweeney PF (2011) TraceAnalyzer: A system for processing performance traces. Soft Practice and Experience 41(3):267–282
Ferme V, Pautasso C (2017) Towards holistic continuous software performance assessment. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion, ACM, New York, NY, USA, ICPE ’17 Companion, pp 159–164
Foundation TE (2017) Eclipse - The Eclipse Foundation open source community website. https://eclipse.org/, last accessed Jan. 2017
Gómez-Martínez E, Gonzalez-Cabero R, Merseguer J (2014) Performance assessment of an architecture with adaptative interfaces for people with special needs. Empir Softw Eng 19(6):1967–2018. https://doi.org/10.1007/s10664-013-9297-1
Gómez A, Joubert C, Merseguer J (2016) A tool for assessing performance requirements of data-intensive applications. In: Proceedings of the XXIV National Conference of Concurrency and Distributed Systems (JCDS 2016), pp 159–169. https://github.com/dice-project/DICE-Simulation, accessed 01/23/2017
Günther CW, Rozinat A (2012) Disco: Discover Your Processes. BPM (Demos) 940:40–44
Hernȧndez S, van Zelst SJ, Ezpeleta J, van der Aalst W (2015) Handling big(ger) logs: Connecting ProM 6 to apache hadoop. In: Proceedings of the BPM Demo Session 2015, CEUR-WS.org, vol 1418 , pp 80–84
Huber N, Walter J, Bähr M, Kounev S (2015) Model-based autonomic and performance-aware system adaptation in heterogeneous resource environments: A case study. In: International Conference on Cloud and Autonomic Computing, pp 181–191
Huber N, Brosig F, Spinner S, Kounev S, Bähr M (2017) Model-based self-aware performance and resource management using the descartes modeling language. IEEE Trans Softw Eng 43(5):432–452
ISO (2008) Systems and software engineering – High-level Petri nets – Part 2: Transfer format. ISO/IEC 15909-2:2011, International Organization for Standardization, Geneva, Switzerland
Joishi J, Sureka A (2015) Vishleshan: performance comparison and programming process mining algorithms in graph-oriented and relational database query languages. In: Proceedings of the 19th International Database Engineering & Applications Symposium. ACM, pp 192–197
Joubert C, Montesinos M, Sanz J (2014) A Comprehensive Port Operations Management System. ERCIM News 2014(97). http://ercim-news.ercim.eu/en97/special/a-comprehensive-port-operations-management-system http://ercim-news.ercim.eu/en97/special/a-comprehensive-port-operations-management-system
Kounev S (2006) Performance modeling and evaluation of distributed component-based systems using queueing petri nets. IEEE Trans Softw Eng 32 (7):486–502
Kounev S, Huber N, Brosig F, Zhu X (2016) A model-based approach to designing self-aware it systems and infrastructures. IEEE Comput 49(7):53–61. https://doi.org/10.1109/MC.2016.198
Lazowska ED, Zahorjan J, Graham GS, Sevcik KC (1984) Quantitative system performance: Computer system analysis using queueing network models. Prentice-Hall, Inc, Upper Saddle River
Lȯpez-Grao J, Merseguer J, Campos J (2004) From UML activity diagrams to Stochastic Petri nets: application to software performance engineering. In: Proceedings of the Fourth International Workshop on Software and Performance, WOSP 2004, Redwood Shores, California, USA, January 14-16, 2004, pp 25–36
Maggi F M, Mooij A J, Van der Aalst W (2013) Analyzing vessel behavior using process mining. In: van de Laar P, Tretmans J, Borth M (eds) Situation Awareness with Systems of Systems. Springer, pp 133–148. https://doi.org/10.1007/978-1-4614-6230-9
Menasce DA, Almeida VA, Dowdy LW, Dowdy L (2004) Performance by design: computer capacity planning by example. Prentice Hall Professional, Upper Saddle River
NIST/SEMATECH (2013) e-Handbook of Statistical Methods. http://www.itl.nist.gov/div898/handbook/. Accessed 10/15/2016
OMG (2011) UML Profile for MARTE: Modeling and Analysis of Real-time Embedded Systems, Version 1.1. Tech. rep., Object Management Group. http://www.omg.org/spec/MARTE/1.1/
PNMLframework (2017) PNML Framework. https://github.com/lip6/pnmlframework
Prodevelop (1993) Prodevelop- Integrating Tech. https://www.prodevelop.es/en
ProM Tools (2017) ProM Tools. http://www.promtools.org/doku.php
QPRprocessAnalyzer (2011) QPR process analyzer. https://www.qpr.com
RapidMiner (2018) Rapid miner. https://rapidminer.com/
RapidProM (2018) RapidProM: Bringing process mining to analytic workflows. http://www.rapidprom.org/
Rubin VA, Mitsyuk AA, Lomazova IA, van der Aalst WM (2014) Process mining can be applied to software too!. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ACM
Runeson P, Höst M (2008) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131. https://doi.org/10.1007/s10664-008-9102-8
Shavor S, D’Anjou J, Fairbrother S, Kehn D, Kellerman J, McCarthy P (2003) The java developer’s guide to eclipse. Addison-Wesley Longman Publishing Co., Inc., Boston
Smith CU (1990) Performance Engineering of Software Systems, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston
Smith CU (2002) Williams LG. Addison Wesley Longman Publishing Co., Inc., Redwood City
Tarvo A, Reiss SP (2018) Automatic performance prediction of multithreaded programs: a simulation approach. Autom Softw Eng 25(1):101–155
The Eclipse Foundation (2016) Papyrus. https://eclipse.org/papyrus/
UML2 (2015) Unified Modeling Language: Infrastructure. Version 2.5, OMG document: formal/2015- 03-01
Van Dongen BF et al (2005) The ProM framework: A new era in process mining tool support. In: Applications and Theory of Petri Nets 2005, Springer, pp 444–454
Walter J, van Hoorn A, Koziolek H, Okanovic D, Kounev S (2016) Asking “What”?, Automating the “How”?: The Vision of Declarative Performance Engineering. In: Proceedings of the 7th ACM/SPEC International Conference on Performance Engineering, ICPE 2016, Delft, The Netherlands, March 12-16, 2016, pp 91–94
Walter J, Stier C, Koziolek H, Kounev S (2017) An expandable extraction framework for architectural performance models. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion, ACM, New York, NY, USA, ICPE ’17 Companion, pp 165–170
XES (2016) Extensible Event Stream. IEEE Task FOrce on Process Mining, [Online; accessed 18-April-2016]
Yu X, Han S, Zhang D, Xie T (2014) Comprehending performance from real-world execution traces: A device-driver case. ACM SIGPLAN Notices 49(4):193–206
Van der Aalst W (2011) Process mining - discovery, conformance and enhancement of business processes. Springer, Berlin. https://doi.org/10.1007/978-3-642-19345-3
Van der Aalst W (2014) Business intelligence: Third european summer school, eBISS 2013, Dagstuhl Castle, Germany, July 7-12, 2013, Tutorial Lectures. In: Springer International Publishing, chap Process Mining in the Large: A Tutorial, pp 33–76
Van der Aalst W, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2(2):182–192
Acknowledgements
Special thanks to Ismael Torres and Sergi Soler, from Prodevelop, for their help in collecting the logs of the case study. This work has been supported by the European Commission under the H2020 Research and Innovation program [DICE, Grant Agreement No. 644869], the Spanish Ministry of Economy and Competitiveness [ref. CyCriSec-TIN2014-58457-R], and the Aragonese Government [ref. T94, DIStributed COmputation (DISCO)].
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Vittorio Cortellessa
Appendix: Generalized Stochastic Petri Nets
Appendix: Generalized Stochastic Petri Nets
In this appendix, we introduce the modeling formalism of Generalized Stochastic Petri Nets (GSPN).
A Generalized Stochastic Petri net (GSPN) (Ajmone-Marsan et al. 1994) is a bipartite graph, formally defined as a 8-tuple \(\mathcal {N} = (P,T,I,O,H,{\Phi },W,M_{0})\) where:
-
P is the set of places,
-
T = Ti ∪ Tt is the set of transitions, divided into immediate (Ti) and timed (Tt) transitions,
-
I, O, H : P × T → are, respectively, the input, output, and inhibitor arc multiplicity functions,
-
Φ : T → assigns a priority to each transitions: timed transitions have zero priority, while immediate transitions have priority greater than zero,
-
W : T → assigns to each immediate transition a weight, and to each timed transition a firing time delay. The firing time delay is the mean value of the negative exponential distribution,
-
M0 : P → assigns the initial number of tokens to each place.
Figures 6 and 7 show the graphical representation of two GSPN models, where places are depicted as circles, immediate transition as thin black bars and timed ones as thick white bars.
Transitions of a GSPN model represent actions or events in the modeled system, whereas places represent pre- and post-conditions for the actions/events occurrence. The dynamic of a GSPN model is governed by the concession, enabling and firing rules of transitions in a marking M : P →, which is reachable from the initial marking M0 due to the firing of a sequence of transitions.
A transition has concession in a marking M when its input places contain at least as many tokens as the corresponding arc multiplicities, and its inhibitor places contain less tokens than the corresponding arc multiplicities.
A transition is enabled in a marking M iff it has concession in M, and its priority is greater or equal to the one of the transitions t′ having also concession in M.
Consequently, only transitions of the same priority level can be enabled in a marking. A transition t, enabled in marking M, may fire then leading to a new marking M′, according to the equation:
Rights and permissions
About this article
Cite this article
Bernardi, S., Domínguez, J.L., Gómez, A. et al. A systematic approach for performance assessment using process mining. Empir Software Eng 23, 3394–3441 (2018). https://doi.org/10.1007/s10664-018-9606-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-018-9606-9