skip to main content
10.1145/3336294.3336303acmotherconferencesArticle/Chapter ViewAbstractPublication PagessplcConference Proceedingsconference-collections
research-article

Process Mining to Unleash Variability Management: Discovering Configuration Workflows Using Logs

Published: 09 September 2019 Publication History

Abstract

Variability models are used to build configurators. Configurators are programs that guide users through the configuration process to reach a desired configuration that fulfils user requirements. The same variability model can be used to design different configurators employing different techniques. One of the elements that can change in a configurator is the configuration workflow, i.e., the order and sequence in which the different configuration elements are presented to the configuration stakeholders. When developing a configurator, a challenge is to decide the configuration workflow that better suites stakeholders according to previous configurations. For example, when configuring a Linux distribution, the configuration process start by choosing the network or the graphic card, and then other packages with respect to a given sequence. In this paper, we present COnfiguration workfLOw proceSS mIning (COLOSSI), an automated technique that given a set of logs of previous configurations and a variability model can automatically assist to determine the configuration workflow that better fits the configuration logs generated by user activities. The technique is based on process discovery, commonly used in the process mining area, with an adaptation to configuration contexts. Our proposal is validated using existing data from an ERP configuration environment showing its feasibility. Furthermore, we open the door to new applications of process mining techniques in different areas of software product line engineering.

References

[1]
2016. IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams. IEEE Std 1849--2016 (Nov 2016), 1--50.
[2]
Saulius Astromskis, Andrea Janes, and Michael Mairegger. 2015. A Process Mining Approach to Measure How Users Interact with Software: An Industrial Case Study. In Proceedings of the 2015 International Conference on Software and System Process (ICSSP 2015). ACM, New York, NY, USA, 137--141.
[3]
A. Augusto, R. Conforti, M. Dumas, M. L. Rosa, F. M. Maggi, A. Marrella, M. Me-cella, and A. Soo. 2019. Automated Discovery of Process Models from Event Logs: Review and Benchmark. IEEE Transactions on Knowledge and Data Engineering 31, 4 (April 2019), 686--705.
[4]
Frank B Baker and Lawrence J Hubert. 1975. Measuring the power of hierarchical cluster analysis. J. Amer. Statist. Assoc. 70, 349 (1975), 31--38.
[5]
Geoffrey H Ball and David J Hall. 1965. ISODATA, a novel method of data analysis and pattern classification. Technical Report. Stanford research inst Menlo Park CA.
[6]
David. Benavides, Sergio. Segura, and Antonio. Ruiz-Cortés. 2010. Automated analysis of feature models 20 years later. Information Systems 35, 6 (2010), 615--636.
[7]
David Benavides, Pablo Trinidad, Antonio Ruiz Cortés, and Sergio Segura. 2013. FaMa. Springer Berlin Heidelberg, Chapter FaMa, 163--171.
[8]
Jan Bosch. 2018. The Three Layer Product Model: An Alternative View on SPLs and Variability. In Proceedings of the 12th International Workshop on Variability Modelling of Software-Intensive Systems, VAMOS 2018, Madrid, Spain, February 7--9,2018. 1.
[9]
Tadeusz Caliński and Jerzy Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods 3, 1 (1974), 1--27.
[10]
Jorge Cardoso. 2005. Control-flow complexity measurement of processes and Weyuker's properties. In 6th International Enformatika Conference, Vol. 8. 213--218.
[11]
Hsin-Jung Cheng and Akhil Kumar. 2015. Process mining on noisy logs - Can log sanitization help to improve performance? Decision Support Systems 79 (2015), 138--149.
[12]
Raffaele Conforti, Marcello La Rosa, and Arthur H.M. ter Hofstede. 2017. Filtering Out Infrequent Behavior from Business Process Event Logs. IEEE Trans. Knowl. Data Eng. 29, 2 (2017), 300--314.
[13]
Dusanka Dakic, Darko Stefanovic, Ilija Cosic, Teodora Lolic, and Milovan Medoje-vic. 2018. BUSINESS APPLICATION: A LITERATURE REVIEW. In 29TH DAAAM INTERNATIONAL SYMPOSIUM ON INTELLIGENT MANUFACTURING AND AUTOMATION.
[14]
David L Davies and Donald W Bouldin. 1979. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence 2 (1979), 224--227.
[15]
Massimiliano de Leoni, Wil M. P. van der Aalst, and Marcus Dees. 2016. A general framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf. Syst. 56 (2016), 235--257.
[16]
Richard O Duda, Peter E Hart, et al. 1973. Pattern classification and scene analysis. Vol. 3. Wiley New York.
[17]
Joseph C Dunn. 1974. Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics 4, 1 (1974), 95--104.
[18]
A. Durán, D. Benavides, S. Segura, P. Trinidad, and A. Ruiz-Cortés. 2017. FLAME: a formal framework for the automated analysis of software product lines validated by automated specification testing. SOSYM 16, 4 (2017), 1049--1082.
[19]
Alexander Felfernig, Lothar Hotz, Claire Bagley, and Juha Tiihonen. 2014. Knowledge-Based Configuration.
[20]
T Frey and H Van Groenewoud. 1972. A cluster analysis of the D2 matrix of white spruce stands in Saskatchewan based on the maximum-minimum principle. The Journal of Ecology (1972), 873--886.
[21]
J.A. Galindo, D Dhungana, R Rabiser, D Benavides, G Botterweck, and P. Grün-bacher. 2015. Supporting distributed product configuration by integrating heterogeneous variability modeling approaches. Information and Software Technology 62, 1 (2015), 78--100.
[22]
José A. Galindo, David Benavides, Pablo Trinidad, Antonio-Manuel Gutiérrez-Fernández, and Antonio Ruiz-Cortés. 2018. Automated analysis of feature models: Quo vadis? Computing (11 Aug 2018).
[23]
Lucantonio Ghionna, Gianluigi Greco, Antonella Guzzo, and Luigi Pontieri. 2008. Outlier Detection Techniques for Applications. In Foundations of Intelligent Systems, Aijun An, Stan Matwin, Zbigniew W. Raś, and Dominik ślęzak (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 150--159.
[24]
Maria Halkidi, Michalis Vazirgiannis, and Yannis Batistakis. 2000. Quality scheme assessment in the clustering process. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 265--276.
[25]
John A Hartigan. 1975. Clustering algorithms. (1975).
[26]
B. F. A. Hompes, J. C. A. M. Buijs, Wil M. P. van der Aalst, P. M. Dixit, and J. Buurman. 2017. Detecting Changes in Process Behavior Using Comparative Case Clustering. In Data-Driven Process Discovery and Analysis, Paolo Ceravolo and Stefanie Rinderle-Ma (Eds.). Springer International Publishing, 54--75.
[27]
Arnaud Hubaux, Andreas Classen, and Patrick Heymans. 2009. Formal Modelling of Feature Configuration Workflows. In Proceedings of the 13th International Software Product Line Conference (SPLC '09). Carnegie Mellon University, Pittsburgh, PA, USA, 221--230. http://dl.acm.org/citation.cfm?id=1753235.1753266
[28]
A Hubaux, P b Heymans, P.-Y Schobbens, D Deridder, and E.K.a Abbasi. 2013. Supporting multiple perspectives in feature-based configuration. SOSYM 12, 3 (2013), 641--663.
[29]
Lawrence J Hubert and Joel R Levin. 1976. A general statistical framework for assessing categorical clustering in free recall. Psychological bulletin 83, 6 (1976), 1072.
[30]
A.K. Jain, M. N. Murty, and P. J. Flynn. 1999. Data Clustering: A Review. ACM Comput. Surv. 31, 3 (Sept. 1999), 264--323.
[31]
Ari Kobren, Nicholas Monath, Akshay Krishnamurthy, and Andrew McCallum. 2017. A Hierarchical Algorithm for Extreme Clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '17). ACM, New York, NY, USA, 255--264.
[32]
Wojtek J Krzanowski and YT Lai. 1988. A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics (1988), 23--34.
[33]
L Lebart, A Morineau, and M Piron. 2000. Statistique exploratoire multidimensionnelle, Dunod, Paris, France. (2000).
[34]
Sander J. J. Leemans, Dirk Fahland, and Wil M. P. van der Aalst. 2014. Discovering Block-Structured Process Models from Incomplete Event Logs. In Petri Nets (Lecture Notes in Computer Science), Vol. 8489. Springer, 91--110.
[35]
Sander J.J. Leemans, Dirk Fahland, and Wil M. P. van der Aalst. 2015. Scalable Process Discovery with Guarantees. In Enterprise, Business-Process and Information Systems Modeling, Khaled Gaaloul, Rainer Schmidt, Selmin Nurcan, Sérgio Guerreiro, and Qin Ma (Eds.). Springer International Publishing, Cham, 85--101.
[36]
Linh Thao Ly, Conrad Indiono, Jürgen Mangler, and Stefanie Rinderle-Ma. 2012. Data Transformation and Semantic Log Purging for Process Mining. In CAiSE (Lecture Notes in Computer Science), Vol. 7328. Springer, 238--253.
[37]
David J. C. MacKay. 2002. Information Theory, Inference & Learning Algorithms. Cambridge University Press, New York, NY, USA.
[38]
R. S. Mans, M. H. Schonenberg, M. Song, W. M. P. van der Aalst, and P. J. M. Bakker. 2009. Application of Process Mining in Healthcare - A Case Study in a Dutch Hospital. In Biomedical Engineering Systems and Technologies, Ana Fred, Joaquim Filipe, and Hugo Gamboa (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 425--438.
[39]
Laura Măuşter and Nick R. T. P. van Beest. 2009. Redesigning business processes: a methodology based on simulation and techniques. Knowledge and Information Systems 21, 3 (25 Jun 2009), 267.
[40]
Laura Maruster, A. J. M. M. Weijters, Wil M. P. van der Aalst, and Antal van den Bosch. 2002. : Discovering Direct Successors in Process Logs. In Discovery Science, 5th International Conference, DS 2002, Lübeck, Germany, November 24--26, 2002, Proceedings. 364--373.
[41]
Laura Maruster, A. J. M. M. Weijters, Wil M. P. van der Aalst, and Antal van den Bosch. 2006. A Rule-Based Approach for Process Discovery: Dealing with Noise and Imbalance in Process Logs. Data Min. Knowl. Discov. 13, 1 (2006), 67--87.
[42]
John O McClain and Vithala R Rao. 1975. Clustisz: A program to test for the quality of clustering of a set of objects. JMR, Journal of Marketing Research (pre-1986) 12, 000004 (1975), 456.
[43]
Jan Mendling. 2008. Metrics for Business Process Models. Springer Berlin Heidelberg, Berlin, Heidelberg, 103--133.
[44]
Glenn W Milligan. 1980. An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45, 3 (1980), 325--342.
[45]
Glenn W Milligan. 1981. A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46, 2 (1981), 187--199.
[46]
Juliana Alves Pereira, Pawel Matuszyk, Sebastian Krieter, Myra Spiliopoulou, and Gunter Saake. 2018. Personalized recommender systems for product-line configuration processes. Computer Languages, Systems & Structures 54 (2018), 451--471.
[47]
José Miguel Pérez-Álvarez, Alejandro Maté, María Teresa Gómez López, and Juan Trujillo. 2018. Tactical Business-Process-Decision Support based on KPIs Monitoring and Validation. Computers in Industry 102 (2018), 23--39.
[48]
Ricardo Pérez-Castillo, María Fernéndez-Ropero, and Mario Piattini. 2019. Business process model refactoring applying IBUPROFEN. An industrial evaluation. Journal of Systems and Software 147 (2019), 86 -- 103.
[49]
Lua Perimal-Lewis, David Teubner, Paul Hakendorf, and Chris Horwood. 2016. Application of process mining to assess the data quality of routinely collected time-based performance data sourced from electronic health records by validating process conformance. Health informatics journal 22 4 (2016), 1017--1029.
[50]
DA Ratkowsky and GN Lance. 1978. Criterion for determining the number of groups in a classification. (1978).
[51]
F James Rohlf. 1974. Methods of comparing classifications. Annual Review of Ecology and Systematics 5, 1 (1974), 101--113.
[52]
Anne Rozinat, Ivo S. M. de Jong, Christian W. Günther, and Wil M. P. van der Aalst. 2009. Process Mining Applied to the Test Process of Wafer Scanners in ASML. IEEE Trans. Systems, Man, and Cybernetics, Part C 39, 4 (2009), 474--479.
[53]
Vladimir Rubin, Christian W. Günther, Wil M. P. van der Aalst, Ekkart Kindler, Boudewijn F. van Dongen, and Wilhelm Schäfer. 2007. Process Mining Framework for Software Processes. In Software Process Dynamics and Agility, Qing Wang, Dietmar Pfahl, and David M. Raffo (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 169--181.
[54]
Vladimir A. Rubin, Alexey A. Mitsyuk, Irina A. Lomazova, and Wil M. P. van der Aalst. 2014. Process Mining Can Be Applied to Software Too!. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '14). ACM, New York, NY, USA, Article 57, 8 pages.
[55]
Mahdi Sahlabadi, Ravie Chandren Muniyandi, and Zarina Shukur. 2014. Detecting abnormal behavior in social network websites by using a process mining technique. Journal of Computer Science 10, 3 (2014), 393--402.
[56]
Mohammadreza Fani Sani, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. 2017. Improving Process Discovery Results by Filtering Outliers Using Conditional Behavioural Probabilities. In Business Process Management Workshops - BPM 2017 International Workshops, Barcelona, Spain, September 10--11, 2017, Revised Papers. 216--229.
[57]
Pierre-Yves Schobbens, Patrick Heymans, Jean-Christophe Trigaux, and Yves Bontemps. 2007. Generic semantics of feature diagrams. Computer Networks 51, 2 (2007), 456--479.
[58]
Steven She, Rafael Lotufo, Thorsten Berger, Andrzej Wasowski, and Krzysztof Czarnecki. 2010. The Variability Model of The Linux Kernel. In VAMOS, Vol. 10. 45--51.
[59]
Minseok Song, Christian W. Günther, and Wil M. P. van der Aalst. 2009. Trace Clustering in. In Business Process Management Workshops, Danilo Ardagna, Massimo Mecella, and Jian Yang (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 109--120.
[60]
Niek Tax, Natalia Sidorova, and Wil M. P. van der Aalst. 2019. Discovering more precise process models from event logs by filtering out chaotic activities. J. Intell. Inf. Syst. 52, 1 (2019), 107--139.
[61]
T Thüm, S Apel, C Kästner, I Schaefer, and G.a Saake. 2014. A classification and survey of analysis strategies for software product lines. ACMCS 47, 1 (2014).
[62]
Wil M. P. van der Aalst. 2011. Analyzing "Spaghetti Processes". Springer Berlin Heidelberg, Berlin, Heidelberg. 301--317 pages.
[63]
Wil M. P. van der Aalst. 2016. Process Mining - Data Science in Action, Second Edition. Springer.
[64]
Boudewijn F. van Dongen, Ana Karla A. de Medeiros, H. M. W. Verbeek, A. J. M. M. Weijters, and Wil M. P. van der Aalst. 2005. The ProM Framework: A New Era in Process Mining Tool Support. In Applications and Theory of Petri Nets 2005, 26th International Conference, ICATPN 2005, Miami, USA, June 20--25, 2005, Proceedings. 444--454.
[65]
Seppe K. L. M. vanden Broucke and Jochen De Weerdt. 2017. Fodina: A robust and flexible heuristic process discovery technique. Decision Support Systems 100 (2017), 109--118.
[66]
Angel Jesus Varela-Vaca and Rafael M. Gasca. 2013. Towards the automatic and optimal selection of risk treatments for business processes using a constraint programming approach. Information & Software Technology 55, 11 (2013), 1948--1973.
[67]
Yue Wang and Mitchell Tseng. 2014. Attribute selection for product configurator design based on Gini index. International Journal of Production Research 52, 20 (2014), 6136--6145.
[68]
Yue Wang and Mitchell M. Tseng. 2011. Adaptive attribute selection for configurator design via Shapley value. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 25, 2 (2011), 185--195.
[69]
Joe H Ward Jr. 1963. Hierarchical grouping to optimize an objective function. Journal of the American statistical association 58, 301 (1963), 236--244.
[70]
A. J. M. M. Weijters and J. T. S. Ribeiro. 2011. Flexible Heuristics Miner (FHM). In CIDM. IEEE, 310--317.

Cited By

View all
  • (2024)VaryMinions: leveraging RNNs to identify variants in variability-intensive systems’ logsEmpirical Software Engineering10.1007/s10664-024-10473-529:4Online publication date: 15-Jun-2024
  • (2024)ERP Logs and Its Use for Process Mining Student Learning PurposesInnovative Technologies and Learning10.1007/978-3-031-65881-5_20(185-192)Online publication date: 21-Jul-2024
  • (2023)Exploring the Impact of Data Mining Techniques in Healthcare and Medical Data: A Systematic Literature Review2023 9th International Engineering Conference on Sustainable Technology and Development (IEC)10.1109/IEC57380.2023.10438817(206-211)Online publication date: 21-Feb-2023
  • Show More Cited By

Index Terms

  1. Process Mining to Unleash Variability Management: Discovering Configuration Workflows Using Logs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SPLC '19: Proceedings of the 23rd International Systems and Software Product Line Conference - Volume A
    September 2019
    356 pages
    ISBN:9781450371384
    DOI:10.1145/3336294
    © 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 September 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. clustering
    2. configuration workflow
    3. process discovery
    4. process mining
    5. variability

    Qualifiers

    • Research-article

    Funding Sources

    • Junta de Andalucia
    • Cátedra de Telefónica Inteligencia en la Red
    • European Regional Development Fund (ERDF/FEDER)
    • Ministry of Science and Technology of Spain
    • MINECO - Juan de la Cierva postdoctoral program

    Conference

    SPLC 2019

    Acceptance Rates

    Overall Acceptance Rate 167 of 463 submissions, 36%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)VaryMinions: leveraging RNNs to identify variants in variability-intensive systems’ logsEmpirical Software Engineering10.1007/s10664-024-10473-529:4Online publication date: 15-Jun-2024
    • (2024)ERP Logs and Its Use for Process Mining Student Learning PurposesInnovative Technologies and Learning10.1007/978-3-031-65881-5_20(185-192)Online publication date: 21-Jul-2024
    • (2023)Exploring the Impact of Data Mining Techniques in Healthcare and Medical Data: A Systematic Literature Review2023 9th International Engineering Conference on Sustainable Technology and Development (IEC)10.1109/IEC57380.2023.10438817(206-211)Online publication date: 21-Feb-2023
    • (2022)Reasoning on the usage control security policies over data artifact business process modelsComputer Science and Information Systems10.2298/CSIS210217061E19:2(547-572)Online publication date: 2022
    • (2021)VaryMinions: leveraging RNNs to identify variants in event logsProceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution10.1145/3472674.3473980(13-18)Online publication date: 23-Aug-2021
    • (2021)CARMENComputers in Industry10.1016/j.compind.2021.103524132:COnline publication date: 1-Nov-2021
    • (2021)Explanations for over-constrained problems using QuickXPlain with speculative executionsJournal of Intelligent Information Systems10.1007/s10844-021-00675-4Online publication date: 6-Nov-2021
    • (2021)Discovering configuration workflows from existing logs using process miningEmpirical Software Engineering10.1007/s10664-020-09911-x26:1Online publication date: 26-Jan-2021
    • (2020)Process Mining with Applications to Automotive IndustryIOP Conference Series: Materials Science and Engineering10.1088/1757-899X/924/1/012033924(012033)Online publication date: 14-Oct-2020
    • (2020)Definition and Verification of Security Configurations of Cyber-Physical SystemsComputer Security10.1007/978-3-030-64330-0_9(135-155)Online publication date: 17-Dec-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media