research-article

VaryMinions: leveraging RNNs to identify variants in event logs

Authors:

Xavier Devroey,

Patrick Heymans,

Gilles PerrouinAuthors Info & Claims

MaLTESQuE 2021: Proceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution

Pages 13 - 18

https://doi.org/10.1145/3472674.3473980

Published: 23 August 2021 Publication History

Abstract

Business processes have to manage variability in their execution, e.g., to deliver the correct building permit in different municipalities. This variability is visible in event logs, where sequences of events are shared by the core process (building permit authorisation) but may also be specific to each municipality. To rationalise resources (e.g., derive a configurable business process capturing all municipalities’ permit variants) or to debug anomalous behaviour, it is mandatory to identify to which variant a given trace belongs. This paper supports this task by training Long Short Term Memory (LSTMs) and Gated Recurrent Units (GRUs) algorithms on two datasets: a configurable municipality and a travel expenses workflow. We demonstrate that variability can be identified accurately (>87%) and discuss the challenges of learning highly entangled variants.

References

[1]

Juliana Alves Pereira, Mathieu Acher, Hugo Martin, and Jean-Marc Jézéquel. 2020. Sampling Effect on Performance Prediction of Configurable Systems: A Case Study. In Proceedings of the ACM/SPEC International Conference on Performance Engineering. ACM, 277–288. https://doi.org/10.1145/3358960.3379137

Digital Library

[2]

Benoit Amand, Maxime Cordy, Patrick Heymans, Mathieu Acher, Paul Temple, and Jean-Marc Jézéquel. 2019. Towards learning-aided configuration in 3D printing: Feasibility study and application to defect prediction. In Proceedings of the 13th International Workshop on Variability Modelling of Software-Intensive Systems. ACM, 1–9. https://doi.org/10.1145/3302333.3302338

Digital Library

[3]

Eleonora Arganese, Alessandro Fantechi, Stefania Gnesi, and Laura Semini. 2020. Nuts and Bolts of Extracting Variability Models from Natural Language Requirements Documents. In Integrating Research and Practice in Software Engineering. Springer, 125–143. https://doi.org/10.1007/978-3-030-26574-8_10

[4]

Nour Assy, Nguyen Ngoc Chan, and Walid Gaaloul. 2015. An automated approach for assisting the design of configurable process models. IEEE transactions on services computing, 8, 6 (2015), 874–888. https://doi.org/10.1109/TSC.2015.2477815

[5]

Davide Bacciu, Stefania Gnesi, and Laura Semini. 2015. Using a Machine Learning Approach to Implement and Evaluate Product Line Features. In Proceedings 11th International Workshop on Automated Specification and Verification of Web Systems, WWV 2015, Oslo, Norway, 23rd June 2015, Maurice H. ter Beek and Alberto Lluch-Lafuente (Eds.) (EPTCS, Vol. 188). EPTCS, 75–83. https://doi.org/10.4204/EPTCS.188.8

[6]

Szymon Bobek, Mateusz Baran, Krzysztof Kluza, and Grzegorz J Nalepa. 2013. Application of Bayesian Networks to Recommendations in Business Process Modeling. In AIBP@ AI* IA. Springer, 41–50.

[7]

Joos CAM Buijs, Boudewijn F van Dongen, and Wil MP van der Aalst. 2013. Mining configurable process models from collections of event logs. In Business process management. Springer, 33–48. https://doi.org/10.1007/978-3-642-40176-3_5

Digital Library

[8]

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 1724–1734. https://doi.org/10.3115/v1/D14-1179

[9]

François Chollet. 2015. Keras. https://keras.io

[10]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014.

[11]

Jochen De Weerdt, Seppe KLM vanden Broucke, Jan Vanthienen, and Bart Baesens. 2012. Leveraging process discovery with trace clustering and text mining for intelligent analysis of incident management processes. In IEEE Congress on Evolutionary Computation. IEEE, 1–8. https://doi.org/10.1109/CEC.2012.6256459

[12]

TensorFlow Developers. 2021. TensorFlow. https://doi.org/10.5281/zenodo.4758419

[13]

Xavier Devroey, Gilles Perrouin, Axel Legay, Pierre-Yves Schobbens, and Patrick Heymans. 2016. Search-based similarity-driven behavioural SPL testing. In Proceedings of the Tenth International Workshop on Variability Modelling of Software-intensive Systems. 89–96.

Digital Library

[14]

Joerg Evermann, Jana-Rebecca Rehse, and Peter Fettke. 2017. Predicting process behaviour using deep learning. Decision Support Systems, 100 (2017), 129–140. issn:0167-9236 https://doi.org/10.1016/j.dss.2017.04.003

[15]

Dirk Fahland and Wil M.P. van der Aalst. 2015. Model repair — aligning process models to reality. Information Systems, 47 (2015), 220–243. issn:0306-4379 https://doi.org/10.1016/j.is.2013.12.007

Digital Library

[16]

Sophie Fortz, Paul Temple, Xavier Devroey, Patrick Heymans, and Gilles Perrouin. 2021. VaryMinions. https://doi.org/10.5281/zenodo.5083334

[17]

Salah Ghamizi, Maxime Cordy, Mike Papadakis, and Yves Le Traon. 2019. Automated search for configurations of convolutional neural network architectures. In Proceedings of the 23rd International Systems and Software Product Line Conference-Volume A. ACM, 119–130. https://doi.org/10.1145/3336294.3336306

Digital Library

[18]

Salah Ghamizi, Maxime Cordy, Mike Papadakis, and Yves Le Traon. 2020. FeatureNET: diversity-driven generation of deep learning models. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings. ACM, 41–44. https://doi.org/10.1145/3377812.3382153

Digital Library

[19]

Axel Halin, Alexandre Nuttinck, Mathieu Acher, Xavier Devroey, Gilles Perrouin, and Benoit Baudry. 2019. Test them all, is it worth it? Assessing configuration sampling on the JHipster Web development stack. Empirical Software Engineering, 24, 2 (2019), 674–717. https://doi.org/10.1007/s10664-018-9635-4

Digital Library

[20]

Markku Hinkka, Teemu Lehto, Keijo Heljanko, and Alexander Jung. 2018. Classifying process instances using recurrent neural networks. In International Conference on Business Process Management. Springer, 313–324. https://doi.org/10.1007/978-3-030-11641-5_25

[21]

Sepp Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6, 02 (1998), 107–116. https://doi.org/10.1142/S0218488598000094

Digital Library

[22]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9, 8 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Digital Library

[23]

Paul Jaccard. 1901. Etude comparative de la distribution florale dans une portion des Alpes et des JurA. Bulletin de la Société Vaudoise des Sciences Naturelles.

[24]

Christian Kaltenecker, Alexander Grebhahn, Norbert Siegmund, and Sven Apel. 2020. The interplay of sampling and machine learning for software performance prediction. IEEE Software, 37, 4 (2020), 58–66. https://doi.org/10.1109/MS.2020.2987024

Digital Library

[25]

Kamran Kowsari, Donald E Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S Gerber, and Laura E Barnes. 2017. Hdltex: Hierarchical deep learning for text classification. In 16th IEEE international conference on machine learning and applications (ICMLA). IEEE, 364–371. https://doi.org/10.1109/ICMLA.2017.0-134

[26]

Marcello La Rosa and Marlon Dumas. 2008. Configurable process models: how to adopt standard practices in your how way? BPTrends Newsletter.

[27]

Yang Li, Sandro Schulze, and Gunter Saake. 2017. Reverse engineering variability from natural language documents: A systematic literature review. In Proceedings of the 21st International Systems and Software Product Line Conference-Volume A. ACM, 133–142. https://doi.org/10.1145/3106195.3106207

Digital Library

[28]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2 (2020), 318–327. https://doi.org/10.1109/TPAMI.2018.2858826

[29]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent Neural Network for Text Classification with Multi-Task Learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 2873–2879. isbn:9781577357704 https://doi.org/10.5555/3060832.3061023

Digital Library

[30]

Ronny S Mans, MH Schonenberg, Minseok Song, Wil MP van der Aalst, and Piet JM Bakker. 2008. Application of process mining in healthcare–a case study in a dutch hospital. In International joint conference on biomedical engineering systems and technologies. Springer, 425–438. https://doi.org/10.1007/978-3-540-92219-3_32

[31]

Vivek Nair, Tim Menzies, Norbert Siegmund, and Sven Apel. 2017. Using bad learners to find good configurations. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, Eric Bodden, Wilhelm Schäfer, Arie van Deursen, and Andrea Zisman (Eds.). ACM, 257–267. isbn:978-1-4503-5105-8 https://doi.org/10.1145/3106237.3106238

Digital Library

[32]

Hoang Thi Cam Nguyen, Suhwan Lee, Jongchan Kim, Jonghyeon Ko, and Marco Comuzzi. 2019. Autoencoders for improving quality of process event logs. Expert Systems with Applications, 131 (2019), 132–147. https://doi.org/10.1016/j.eswa.2019.04.052

[33]

Timo Nolle, Alexander Seeliger, and Max Mühlhäuser. 2018. BINet: multivariate business process anomaly detection using deep learning. In International Conference on Business Process Management. Springer, 271–287. https://doi.org/10.1007/978-3-319-98648-7_16

Digital Library

[34]

Timo Nolle, Alexander Seeliger, Nils Thoma, and Max Mühlhäuser. 2020. DeepAlign: Alignment-Based Process Anomaly Correction Using Recurrent Neural Networks. In International Conference on Advanced Information Systems Engineering. Springer, 319–333. https://doi.org/10.1007/978-3-030-49435-3_20

Digital Library

[35]

Juliana Alves Pereira, Hugo Martin, Paul Temple, and Mathieu Acher. 2020. Machine Learning and Configurable Systems: A Gentle Introduction. In Proceedings of the 24th ACM Conference on Systems and Software Product Line: Volume A (SPLC ’20). ACM, Article 40, 1 pages. isbn:9781450375696 https://doi.org/10.1145/3382025.3414976

Digital Library

[36]

Marcello La Rosa, Wil MP Van Der Aalst, Marlon Dumas, and Fredrik P Milani. 2017. Business process variability modeling: A survey. Comput. Surveys, 50, 1 (2017), 1–45. https://doi.org/10.1145/3041957

Digital Library

[37]

Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45, 11 (1997), 2673–2681. https://doi.org/10.1109/78.650093

Digital Library

[38]

Rabab Sikal, Hanae Sbai, and Laila Kjiri. 2018. Configurable process mining: variability Discovery Approach. In IEEE 5th International Congress on Information Science and Technology (CiSt). IEEE, 137–142. https://doi.org/10.1109/CIST.2018.8596526

[39]

Minseok Song, H Yang, Seyed Hossein Siadat, and Mykola Pechenizkiy. 2013. A comparative study of dimensionality reduction techniques to enhance trace clustering performances. Expert Systems with Applications, 40, 9 (2013), 3722 – 3737. issn:0957-4174 https://doi.org/10.1016/j.eswa.2012.12.078

Digital Library

[40]

Stefan Strüder, Mukelabai Mukelabai, Daniel Strüber, and Thorsten Berger. 2020. Feature-oriented defect prediction. In Proceedings of the 24th ACM Conference on Systems and Software Product Line: Volume A. ACM, 1–12. https://doi.org/10.1145/3382025.3414960

Digital Library

[41]

Niek Tax, Ilya Verenich, Marcello La Rosa, and Marlon Dumas. 2017. Predictive Business Process Monitoring with LSTM Neural Networks. In Advanced Information Systems Engineering. Springer, 477–492. isbn:978-3-319-59536-8 https://doi.org/10.1007/978-3-319-59536-8_30

[42]

Farbod Taymouri, Marcello La Rosa, Marlon Dumas, and Fabrizio Maria Maggi. 2021. Business process variant analysis: Survey and classification. Knowledge-Based Systems, 211 (2021), 106557. issn:0950-7051 https://doi.org/10.1016/j.knosys.2020.106557

[43]

Paul Temple, Mathieu Acher, Gilles Perrouin, Battista Biggio, Jean-Marc Jézéquel, and Fabio Roli. 2019. Towards quality assurance of software product lines with adversarial configurations. In Proceedings of the 23rd International Systems and Software Product Line Conference-Volume A. ACM, 277–288. https://doi.org/10.1145/3336294.3336309

Digital Library

[44]

Boudewijn van Dongen. 2020. BPI Challenge 2020. https://doi.org/10.4121/uuid:52fb97d4-4588-43c9-9d04-3604d4613b51

[45]

B.F. (Boudewijn) van Dongen. 2015. BPI Challenge 2015. https://doi.org/10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1

[46]

Ángel Jesús Varela-Vaca, José A Galindo, Belén Ramos-Gutiérrez, María Teresa Gómez-López, and David Benavides. 2019. Process mining to unleash variability management: discovering configuration workflows using logs. In Proceedings of the 23rd International Systems and Software Product Line Conference-Volume A. ACM, 265–276. https://doi.org/10.1145/3336294.3336303

Digital Library

Cited By

Fortz STemple PDevroey XHeymans PPerrouin G(2024)VaryMinions: leveraging RNNs to identify variants in variability-intensive systems’ logsEmpirical Software Engineering10.1007/s10664-024-10473-529:4Online publication date: 15-Jun-2024
https://dl.acm.org/doi/10.1007/s10664-024-10473-5
Tavassoli SKhosravi R(2024)Efficient construction of family-based behavioral models from adaptively learned modelsSoftware and Systems Modeling10.1007/s10270-024-01199-524:1(225-251)Online publication date: 7-Aug-2024
https://doi.org/10.1007/s10270-024-01199-5
Han RZheng DYu FLi YHu J(2024)Biomedical Named Entity Recognition Model Based on Knowledge DistillationBusiness Intelligence and Information Technology10.1007/978-981-97-3980-6_38(443-451)Online publication date: 30-Aug-2024
https://doi.org/10.1007/978-981-97-3980-6_38
Show More Cited By

Index Terms

VaryMinions: leveraging RNNs to identify variants in event logs
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software reverse engineering

Recommendations

VaryMinions: leveraging RNNs to identify variants in variability-intensive systems’ logs
Abstract
From business processes to course management, variability-intensive software systems (VIS) are now ubiquitous. One can configure these systems’ behaviour by activating options, e.g., to derive variants handling building permits across ...
How to exploit sparsity in RNNs on event-driven architectures
SCOPES '21: Proceedings of the 24th International Workshop on Software and Compilers for Embedded Systems

Event-driven architectures have been shown to provide low-power, low-latency artificial neural network (ANN) inference. This is especially beneficial on Edge devices, particularly when combined with sparse execution. Recurrent neural networks (RNNs) are ...
Neuron-Level Fuzzy Memoization in RNNs
MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. Unlike conventional feed-forward DNNs, RNNs remember past information to improve the accuracy of future predictions and, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MaLTESQuE 2021: Proceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution

August 2021

36 pages

ISBN:9781450386258

DOI:10.1145/3472674

General Chairs:
Apostolos Ampatzoglou
University of Macedonia, Greece
,
Daniel Feitosa
University of Groningen, Netherlands
,
Gemma Catolino
Tilburg University, Netherlands
,
Valentina Lenarduzzi
LUT University, Finland

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Fonds De La Recherche Scientifique - FNRS

Conference

ESEC/FSE '21

Sponsor:

SIGPLAN

ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 23, 2021

Athens, Greece

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
110
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fortz STemple PDevroey XHeymans PPerrouin G(2024)VaryMinions: leveraging RNNs to identify variants in variability-intensive systems’ logsEmpirical Software Engineering10.1007/s10664-024-10473-529:4Online publication date: 15-Jun-2024
https://dl.acm.org/doi/10.1007/s10664-024-10473-5
Tavassoli SKhosravi R(2024)Efficient construction of family-based behavioral models from adaptively learned modelsSoftware and Systems Modeling10.1007/s10270-024-01199-524:1(225-251)Online publication date: 7-Aug-2024
https://doi.org/10.1007/s10270-024-01199-5
Han RZheng DYu FLi YHu J(2024)Biomedical Named Entity Recognition Model Based on Knowledge DistillationBusiness Intelligence and Information Technology10.1007/978-981-97-3980-6_38(443-451)Online publication date: 30-Aug-2024
https://doi.org/10.1007/978-981-97-3980-6_38
Feitosa DCatolino GLenarduzzi VAmpatzoglou A(2022)MaLTeSQuE 2021 Workshop SummaryACM SIGSOFT Software Engineering Notes10.1145/3502771.350277747:1(15-17)Online publication date: 25-Jan-2022
https://dl.acm.org/doi/10.1145/3502771.3502777

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten