skip to main content
10.1145/3472674.3473980acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

VaryMinions: leveraging RNNs to identify variants in event logs

Published: 23 August 2021 Publication History

Abstract

Business processes have to manage variability in their execution, e.g., to deliver the correct building permit in different municipalities. This variability is visible in event logs, where sequences of events are shared by the core process (building permit authorisation) but may also be specific to each municipality. To rationalise resources (e.g., derive a configurable business process capturing all municipalities’ permit variants) or to debug anomalous behaviour, it is mandatory to identify to which variant a given trace belongs. This paper supports this task by training Long Short Term Memory (LSTMs) and Gated Recurrent Units (GRUs) algorithms on two datasets: a configurable municipality and a travel expenses workflow. We demonstrate that variability can be identified accurately (>87%) and discuss the challenges of learning highly entangled variants.

References

[1]
Juliana Alves Pereira, Mathieu Acher, Hugo Martin, and Jean-Marc Jézéquel. 2020. Sampling Effect on Performance Prediction of Configurable Systems: A Case Study. In Proceedings of the ACM/SPEC International Conference on Performance Engineering. ACM, 277–288. https://doi.org/10.1145/3358960.3379137
[2]
Benoit Amand, Maxime Cordy, Patrick Heymans, Mathieu Acher, Paul Temple, and Jean-Marc Jézéquel. 2019. Towards learning-aided configuration in 3D printing: Feasibility study and application to defect prediction. In Proceedings of the 13th International Workshop on Variability Modelling of Software-Intensive Systems. ACM, 1–9. https://doi.org/10.1145/3302333.3302338
[3]
Eleonora Arganese, Alessandro Fantechi, Stefania Gnesi, and Laura Semini. 2020. Nuts and Bolts of Extracting Variability Models from Natural Language Requirements Documents. In Integrating Research and Practice in Software Engineering. Springer, 125–143. https://doi.org/10.1007/978-3-030-26574-8_10
[4]
Nour Assy, Nguyen Ngoc Chan, and Walid Gaaloul. 2015. An automated approach for assisting the design of configurable process models. IEEE transactions on services computing, 8, 6 (2015), 874–888. https://doi.org/10.1109/TSC.2015.2477815
[5]
Davide Bacciu, Stefania Gnesi, and Laura Semini. 2015. Using a Machine Learning Approach to Implement and Evaluate Product Line Features. In Proceedings 11th International Workshop on Automated Specification and Verification of Web Systems, WWV 2015, Oslo, Norway, 23rd June 2015, Maurice H. ter Beek and Alberto Lluch-Lafuente (Eds.) (EPTCS, Vol. 188). EPTCS, 75–83. https://doi.org/10.4204/EPTCS.188.8
[6]
Szymon Bobek, Mateusz Baran, Krzysztof Kluza, and Grzegorz J Nalepa. 2013. Application of Bayesian Networks to Recommendations in Business Process Modeling. In AIBP@ AI* IA. Springer, 41–50.
[7]
Joos CAM Buijs, Boudewijn F van Dongen, and Wil MP van der Aalst. 2013. Mining configurable process models from collections of event logs. In Business process management. Springer, 33–48. https://doi.org/10.1007/978-3-642-40176-3_5
[8]
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 1724–1734. https://doi.org/10.3115/v1/D14-1179
[9]
François Chollet. 2015. Keras. https://keras.io
[10]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014.
[11]
Jochen De Weerdt, Seppe KLM vanden Broucke, Jan Vanthienen, and Bart Baesens. 2012. Leveraging process discovery with trace clustering and text mining for intelligent analysis of incident management processes. In IEEE Congress on Evolutionary Computation. IEEE, 1–8. https://doi.org/10.1109/CEC.2012.6256459
[12]
TensorFlow Developers. 2021. TensorFlow. https://doi.org/10.5281/zenodo.4758419
[13]
Xavier Devroey, Gilles Perrouin, Axel Legay, Pierre-Yves Schobbens, and Patrick Heymans. 2016. Search-based similarity-driven behavioural SPL testing. In Proceedings of the Tenth International Workshop on Variability Modelling of Software-intensive Systems. 89–96.
[14]
Joerg Evermann, Jana-Rebecca Rehse, and Peter Fettke. 2017. Predicting process behaviour using deep learning. Decision Support Systems, 100 (2017), 129–140. issn:0167-9236 https://doi.org/10.1016/j.dss.2017.04.003
[15]
Dirk Fahland and Wil M.P. van der Aalst. 2015. Model repair — aligning process models to reality. Information Systems, 47 (2015), 220–243. issn:0306-4379 https://doi.org/10.1016/j.is.2013.12.007
[16]
Sophie Fortz, Paul Temple, Xavier Devroey, Patrick Heymans, and Gilles Perrouin. 2021. VaryMinions. https://doi.org/10.5281/zenodo.5083334
[17]
Salah Ghamizi, Maxime Cordy, Mike Papadakis, and Yves Le Traon. 2019. Automated search for configurations of convolutional neural network architectures. In Proceedings of the 23rd International Systems and Software Product Line Conference-Volume A. ACM, 119–130. https://doi.org/10.1145/3336294.3336306
[18]
Salah Ghamizi, Maxime Cordy, Mike Papadakis, and Yves Le Traon. 2020. FeatureNET: diversity-driven generation of deep learning models. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings. ACM, 41–44. https://doi.org/10.1145/3377812.3382153
[19]
Axel Halin, Alexandre Nuttinck, Mathieu Acher, Xavier Devroey, Gilles Perrouin, and Benoit Baudry. 2019. Test them all, is it worth it? Assessing configuration sampling on the JHipster Web development stack. Empirical Software Engineering, 24, 2 (2019), 674–717. https://doi.org/10.1007/s10664-018-9635-4
[20]
Markku Hinkka, Teemu Lehto, Keijo Heljanko, and Alexander Jung. 2018. Classifying process instances using recurrent neural networks. In International Conference on Business Process Management. Springer, 313–324. https://doi.org/10.1007/978-3-030-11641-5_25
[21]
Sepp Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6, 02 (1998), 107–116. https://doi.org/10.1142/S0218488598000094
[22]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9, 8 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
[23]
Paul Jaccard. 1901. Etude comparative de la distribution florale dans une portion des Alpes et des JurA. Bulletin de la Société Vaudoise des Sciences Naturelles.
[24]
Christian Kaltenecker, Alexander Grebhahn, Norbert Siegmund, and Sven Apel. 2020. The interplay of sampling and machine learning for software performance prediction. IEEE Software, 37, 4 (2020), 58–66. https://doi.org/10.1109/MS.2020.2987024
[25]
Kamran Kowsari, Donald E Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S Gerber, and Laura E Barnes. 2017. Hdltex: Hierarchical deep learning for text classification. In 16th IEEE international conference on machine learning and applications (ICMLA). IEEE, 364–371. https://doi.org/10.1109/ICMLA.2017.0-134
[26]
Marcello La Rosa and Marlon Dumas. 2008. Configurable process models: how to adopt standard practices in your how way? BPTrends Newsletter.
[27]
Yang Li, Sandro Schulze, and Gunter Saake. 2017. Reverse engineering variability from natural language documents: A systematic literature review. In Proceedings of the 21st International Systems and Software Product Line Conference-Volume A. ACM, 133–142. https://doi.org/10.1145/3106195.3106207
[28]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2 (2020), 318–327. https://doi.org/10.1109/TPAMI.2018.2858826
[29]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent Neural Network for Text Classification with Multi-Task Learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 2873–2879. isbn:9781577357704 https://doi.org/10.5555/3060832.3061023
[30]
Ronny S Mans, MH Schonenberg, Minseok Song, Wil MP van der Aalst, and Piet JM Bakker. 2008. Application of process mining in healthcare–a case study in a dutch hospital. In International joint conference on biomedical engineering systems and technologies. Springer, 425–438. https://doi.org/10.1007/978-3-540-92219-3_32
[31]
Vivek Nair, Tim Menzies, Norbert Siegmund, and Sven Apel. 2017. Using bad learners to find good configurations. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, Eric Bodden, Wilhelm Schäfer, Arie van Deursen, and Andrea Zisman (Eds.). ACM, 257–267. isbn:978-1-4503-5105-8 https://doi.org/10.1145/3106237.3106238
[32]
Hoang Thi Cam Nguyen, Suhwan Lee, Jongchan Kim, Jonghyeon Ko, and Marco Comuzzi. 2019. Autoencoders for improving quality of process event logs. Expert Systems with Applications, 131 (2019), 132–147. https://doi.org/10.1016/j.eswa.2019.04.052
[33]
Timo Nolle, Alexander Seeliger, and Max Mühlhäuser. 2018. BINet: multivariate business process anomaly detection using deep learning. In International Conference on Business Process Management. Springer, 271–287. https://doi.org/10.1007/978-3-319-98648-7_16
[34]
Timo Nolle, Alexander Seeliger, Nils Thoma, and Max Mühlhäuser. 2020. DeepAlign: Alignment-Based Process Anomaly Correction Using Recurrent Neural Networks. In International Conference on Advanced Information Systems Engineering. Springer, 319–333. https://doi.org/10.1007/978-3-030-49435-3_20
[35]
Juliana Alves Pereira, Hugo Martin, Paul Temple, and Mathieu Acher. 2020. Machine Learning and Configurable Systems: A Gentle Introduction. In Proceedings of the 24th ACM Conference on Systems and Software Product Line: Volume A (SPLC ’20). ACM, Article 40, 1 pages. isbn:9781450375696 https://doi.org/10.1145/3382025.3414976
[36]
Marcello La Rosa, Wil MP Van Der Aalst, Marlon Dumas, and Fredrik P Milani. 2017. Business process variability modeling: A survey. Comput. Surveys, 50, 1 (2017), 1–45. https://doi.org/10.1145/3041957
[37]
Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45, 11 (1997), 2673–2681. https://doi.org/10.1109/78.650093
[38]
Rabab Sikal, Hanae Sbai, and Laila Kjiri. 2018. Configurable process mining: variability Discovery Approach. In IEEE 5th International Congress on Information Science and Technology (CiSt). IEEE, 137–142. https://doi.org/10.1109/CIST.2018.8596526
[39]
Minseok Song, H Yang, Seyed Hossein Siadat, and Mykola Pechenizkiy. 2013. A comparative study of dimensionality reduction techniques to enhance trace clustering performances. Expert Systems with Applications, 40, 9 (2013), 3722 – 3737. issn:0957-4174 https://doi.org/10.1016/j.eswa.2012.12.078
[40]
Stefan Strüder, Mukelabai Mukelabai, Daniel Strüber, and Thorsten Berger. 2020. Feature-oriented defect prediction. In Proceedings of the 24th ACM Conference on Systems and Software Product Line: Volume A. ACM, 1–12. https://doi.org/10.1145/3382025.3414960
[41]
Niek Tax, Ilya Verenich, Marcello La Rosa, and Marlon Dumas. 2017. Predictive Business Process Monitoring with LSTM Neural Networks. In Advanced Information Systems Engineering. Springer, 477–492. isbn:978-3-319-59536-8 https://doi.org/10.1007/978-3-319-59536-8_30
[42]
Farbod Taymouri, Marcello La Rosa, Marlon Dumas, and Fabrizio Maria Maggi. 2021. Business process variant analysis: Survey and classification. Knowledge-Based Systems, 211 (2021), 106557. issn:0950-7051 https://doi.org/10.1016/j.knosys.2020.106557
[43]
Paul Temple, Mathieu Acher, Gilles Perrouin, Battista Biggio, Jean-Marc Jézéquel, and Fabio Roli. 2019. Towards quality assurance of software product lines with adversarial configurations. In Proceedings of the 23rd International Systems and Software Product Line Conference-Volume A. ACM, 277–288. https://doi.org/10.1145/3336294.3336309
[44]
Boudewijn van Dongen. 2020. BPI Challenge 2020. https://doi.org/10.4121/uuid:52fb97d4-4588-43c9-9d04-3604d4613b51
[45]
B.F. (Boudewijn) van Dongen. 2015. BPI Challenge 2015. https://doi.org/10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1
[46]
Ángel Jesús Varela-Vaca, José A Galindo, Belén Ramos-Gutiérrez, María Teresa Gómez-López, and David Benavides. 2019. Process mining to unleash variability management: discovering configuration workflows using logs. In Proceedings of the 23rd International Systems and Software Product Line Conference-Volume A. ACM, 265–276. https://doi.org/10.1145/3336294.3336303

Cited By

View all
  • (2024)VaryMinions: leveraging RNNs to identify variants in variability-intensive systems’ logsEmpirical Software Engineering10.1007/s10664-024-10473-529:4Online publication date: 15-Jun-2024
  • (2024)Efficient construction of family-based behavioral models from adaptively learned modelsSoftware and Systems Modeling10.1007/s10270-024-01199-524:1(225-251)Online publication date: 7-Aug-2024
  • (2024)Biomedical Named Entity Recognition Model Based on Knowledge DistillationBusiness Intelligence and Information Technology10.1007/978-981-97-3980-6_38(443-451)Online publication date: 30-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MaLTESQuE 2021: Proceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution
August 2021
36 pages
ISBN:9781450386258
DOI:10.1145/3472674
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Configurable processes
  2. Recurrent Neural Networks
  3. Variability Mining

Qualifiers

  • Research-article

Funding Sources

Conference

ESEC/FSE '21
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)VaryMinions: leveraging RNNs to identify variants in variability-intensive systems’ logsEmpirical Software Engineering10.1007/s10664-024-10473-529:4Online publication date: 15-Jun-2024
  • (2024)Efficient construction of family-based behavioral models from adaptively learned modelsSoftware and Systems Modeling10.1007/s10270-024-01199-524:1(225-251)Online publication date: 7-Aug-2024
  • (2024)Biomedical Named Entity Recognition Model Based on Knowledge DistillationBusiness Intelligence and Information Technology10.1007/978-981-97-3980-6_38(443-451)Online publication date: 30-Aug-2024
  • (2022)MaLTeSQuE 2021 Workshop SummaryACM SIGSOFT Software Engineering Notes10.1145/3502771.350277747:1(15-17)Online publication date: 25-Jan-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media