skip to main content
10.1145/3461001.3471152acmconferencesArticle/Chapter ViewAbstractPublication PagessplcConference Proceedingsconference-collections
research-article

A machine learning model to classify the feature model maintainability

Published:06 September 2021Publication History

ABSTRACT

Software Product Lines (SPL) are generally specified using a Feature Model (FM), an artifact designed in the early stages of the SPL development life cycle. This artifact can quickly become too complex, which makes it challenging to maintain an SPL. Therefore, it is essential to evaluate the artifact's maintainability continuously. The literature brings some approaches that evaluate FM maintainability through the aggregation of maintainability measures. Machine Learning (ML) models can be used to create these approaches. They can aggregate the values of independent variables into a single target data, also called a dependent variable. Besides, when using white-box ML models, it is possible to interpret and explain the ML model results. This work proposes white-box ML models intending to classify the FM maintainability based on 15 measures. To build the models, we performed the following steps: (i) we compared two approaches to evaluate the FM maintainability through a human-based oracle of FM maintainability classifications; (ii) we used the best approach to pre-classify the ML training dataset; (iii) we generated three ML models and compared them against classification accuracy, precision, recall, F1 and AUC-ROC; and, (iv) we used the best model to create a mechanism capable of providing improvement indicators to domain engineers. The best model used the decision tree algorithm that obtained accuracy, precision, and recall of 0.81, F1-Score of 0.79, and AUC-ROC of 0.91. Using this model, we could reduce the number of measures needed to evaluate the FM maintainability from 15 to 9 measures.

References

  1. Mathieu Acher, Benoit Baudry, Patrick Heymans, Anthony Cleve, and Jean-Luc Hainaut. 2013. Support for Reverse Engineering and Maintaining Feature Models. In Proceedings of the Seventh International Workshop on Variability Modelling of Software-Intensive Systems (VaMoS '13). Association for Computing Machinery, New York, NY, USA, Article 20, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mohamed Alloghani, Dhiya Al-Jumeily, Jamila Mustafina, Abir Hussain, and Ahmed J. Aljaaf. 2020. A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science. Springer, Cham, 3--21.Google ScholarGoogle Scholar
  3. Ethem Alpaydin. 2020. Introduction to machine learning. MIT press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hadeel Alsolai and Marc Roper. 2020. A systematic literature review of machine learning techniques for software maintainability prediction. Information and Software Technology 119 (2020), 106214. Google ScholarGoogle ScholarCross RefCross Ref
  5. Sven Apel, Don S. Batory, Christian Kästner, and Gunter Saake. 2013. Feature-Oriented Software Product Lines - Concepts and Implementation. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ebrahim Bagheri and Dragan Gasevic. 2011. Assessing the maintainability of software product line feature models using structural metrics. Software Quality Journal 19, 3 (2011), 579--612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gabriel Bailey, Allison Joffrion, and Megan Pearson. 2018. A comparison of machine learning applications across professional sectors. Available at SSRN 3174123 (2018).Google ScholarGoogle Scholar
  8. Don Batory. 2005. Feature Models, Grammars, and Propositional Formulas. In Software Product Lines, Henk Obbink and Klaus Pohl (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 7--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Clément Bénard, Gérard Biau, Sébastien Veiga, and Erwan Scornet. 2021. Interpretable random forests via rule extraction. In International Conference on Artificial Intelligence and Statistics. PMLR, 937--945.Google ScholarGoogle Scholar
  10. Thorsten Berger and Jianmei Guo. 2014. Towards System Analysis with Variability Model Metrics. In Proceedings of the Eighth International Workshop on Variability Modelling of Software-Intensive Systems (VaMoS '14). Association for Computing Machinery, New York, NY, USA, Article 23, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Thorsten Berger and Jianmei Guo. 2014. Towards system analysis with variability model metrics. In Proceedings of the Eighth International Workshop on Variability Modelling of Software-Intensive Systems. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Carla I.M. Bezerra, Rossana M.C. Andrade, and Jose Maria Monteiro. 2017. Exploring quality measures for the evaluation of feature models: a case study. Journal of Systems and Software 131 (2017), 366--385.Google ScholarGoogle ScholarCross RefCross Ref
  13. Carla I. M. Bezerra, Rossana M. C. Andrade, and José Maria S. Monteiro. 2014. Measures for Quality Evaluation of Feature Models. In Software Reuse for Dynamic Systems in the Cloud and Beyond, Ina Schaefer and Ioannis Stamelos (Eds.). Springer International Publishing, Cham, 282--297.Google ScholarGoogle Scholar
  14. Carla I. M. Bezerra, Jefferson Barbosa, Joao Holanda Freires, Rossana M. C. Andrade, and José Maria Monteiro. 2016. DyMMer: A Measurement-Based Tool to Support Quality Evaluation of DSPL Feature Models. In Proceedings of the 20th International Systems and Software Product Line Conference (SPLC). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Carla I. M. Bezerra, José Maria Monteiro, Rossana M. C. Andrade, and Lincoln S. Rocha. 2016. Analyzing the Feature Models Maintainability over Their Evolution Process: An Exploratory Study. In Proceedings of the Tenth International Workshop on Variability Modelling of Software-Intensive Systems (VaMoS). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Giuseppe Bonaccorso. 2017. Machine learning algorithms. Packt Publishing Ltd.Google ScholarGoogle Scholar
  17. Michael W Browne. 2000. Cross-validation methods. Journal of Mathematical Psychology 44, 1 (2000), 108--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Johannes Bürdek, Timo Kehrer, Malte Lochau, Dennis Reuling, Udo Kelter, and Andy Schürr. 2016. Reasoning about product-line evolution using complex feature model differences. Automated Software Engineering 23, 4 (01 Dec 2016), 687--733. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 1 (02 Jan 2020).Google ScholarGoogle Scholar
  20. Paul Clements and Linda Northrop. 2002. Software product lines. Addison-Wesley Boston.Google ScholarGoogle Scholar
  21. Davi Cedraz S. de Oliveira and Carla I. M. Bezerra. 2019. Development of the Maintainability Index for SPLs Feature Models Using Fuzzy Logic. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering (SBES). ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Issam El Naqa and Martin J Murphy. 2015. What is machine learning? In machine learning in radiation oncology. Springer, 3--11.Google ScholarGoogle Scholar
  23. Sascha El-Sharkawy, Adam Krafczyk, and Klaus Schmid. 2019. MetricHaven: More than 23,000 Metrics for Measuring Quality Attributes of Software Product Lines. In Proceedings of the 23rd International Systems and Software Product Line Conference - Volume B (SPLC '19). Association for Computing Machinery, New York, NY, USA, 25--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sascha El-Sharkawy, Adam Krafczyk, and Klaus Schmid. 2020. Fast Static Analyses of Software Product Lines: An Example with More than 42,000 Metrics. In Proceedings of the 14th International Working Conference on Variability Modelling of Software-Intensive Systems (VAMOS '20). Association for Computing Machinery, New York, NY, USA, Article 8, 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sascha El-Sharkawy, Nozomi Yamagishi-Eichler, and Klaus Schmid. 2019. Metrics for analyzing variability and its implementation in software product lines: A systematic literature review. Information and Software Technology 106 (2019), 1--30. Google ScholarGoogle ScholarCross RefCross Ref
  26. Brandon M Greenwell, B Boehmke, and B Gray. 2020. Variable importance plots---An introduction to the vip package. The R Journal 12, 1 (2020), 343--366.Google ScholarGoogle ScholarCross RefCross Ref
  27. Anil K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8 (2010), 651--666. Award winning papers from the 19th International Conference on Pattern Recognition (ICPR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sudan Jha, Raghvendra Kumar, Le Hoang Son, Mohamed Abdel-Basset, Ishaani Priyadarshini, Rohit Sharma, and Hoang Viet Long. 2019. Deep Learning Approach for Software Maintainability Metrics Prediction. IEEE Access 7 (2019), 61840--61855.Google ScholarGoogle ScholarCross RefCross Ref
  29. Kyo C Kang, Sholom G Cohen, James A Hess, William E Novak, and A Spencer Peterson. 1990. Feature-oriented domain analysis (FODA) feasibility study. Technical Report. Carnegie-Mellon Univ Pittsburgh Pa Software Engineering Inst.Google ScholarGoogle Scholar
  30. George Klir and Bo Yuan. 1995. Fuzzy sets and fuzzy logic. Vol. 4. Prentice hall New Jersey.Google ScholarGoogle Scholar
  31. Luan Lima, Anderson Uchôa, Carla Bezerra, Emanuel Coutinho, and Lincoln Rocha. 2020. Visualizing the Maintainability of Feature Models in SPLs. In Anais do VIII Workshop de Visualização, Evolução e Manutenção de Software. SBC, 1--8.Google ScholarGoogle Scholar
  32. Octavio Loyola-Gonzalez. 2019. Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access 7 (2019), 154096--154113.Google ScholarGoogle ScholarCross RefCross Ref
  33. Yuxin Ma, Wei Chen, Xiaohong Ma, Jiayi Xu, Xinxin Huang, Ross Maciejewski, and Anthony KH Tung. 2017. EasySVM: A visual analysis approach for open-box support vector machines. Computational Visual Media 3, 2 (2017), 161--175.Google ScholarGoogle ScholarCross RefCross Ref
  34. Valerio Maggio. 2013. Improving Software Maintenance using Unsupervised Machine Learning techniques. Ph.D. Dissertation. University of Naples Federico II, Italy. http://www.fedoa.unina.it/9079/Google ScholarGoogle Scholar
  35. Maíra Marques, Jocelyn Simmonds, Pedro O. Rossel, and María Cecilia Bastarrica. 2019. Software product line evolution: A systematic literature review. Information and Software Technology 105 (2019), 190--208.Google ScholarGoogle ScholarCross RefCross Ref
  36. Stephen Marsland. 2015. Machine learning. CRC press.Google ScholarGoogle Scholar
  37. Marcilio Mendonca, Moises Branco, and Donald Cowan. 2009. S.P.L.O.T.: Software Product Lines Online Tools. In Proceedings of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications (OOPSLA). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sonia Montagud, Silvia Abrahão, and Emilio Insfran. 2012. A systematic review of quality attributes and measures for software product lines. Software Quality Journal 20, 3 (2012), 425--486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sarang Narkhede. 2018. Understanding auc-roc curve. Towards Data Science 26 (2018), 220--227.Google ScholarGoogle Scholar
  40. Leonardo Passos, Krzysztof Czarnecki, Sven Apel, Andrzej Wąsowski, Christian Kästner, and Jianmei Guo. 2013. Feature-Oriented Software Evolution. In Proceedings of the Seventh International Workshop on Variability Modelling of Software-Intensive Systems (VaMoS). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Neil J Salkind and Terese Rainwater. 2006. Exploring research. Pearson Prentice Hall Upper Saddle River, NJ.Google ScholarGoogle Scholar
  42. Patrick Schober, Christa Boer, and Lothar A Schwarte. 2018. Correlation coefficients: appropriate use and interpretation. Anesthesia & Analgesia 126, 5 (2018), 1763--1768.Google ScholarGoogle ScholarCross RefCross Ref
  43. Pratap Chandra Sen, Mahimarnab Hajra, and Mitadru Ghosh. 2020. Supervised Classification Algorithms in Machine Learning: A Survey and Review. In Emerging Technology in Modelling and Graphics, Jyotsna Kumar Mandal and Debika Bhattacharya (Eds.). Springer Singapore, Singapore, 99--111.Google ScholarGoogle Scholar
  44. Publio Silva, Carla I. M. Bezerra, Rafael Lima, and Ivan Machado. 2020. Classifying Feature Models Maintainability Based on Machine Learning Algorithms. In Proceedings of the 14th Brazilian Symposium on Software Components, Architectures, and Reuse (SBCARS). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Larissa Rocha Soares, Ivan Machado, Eduardo Santana de Almeida, Christian Kästner, and Sarah Nadi. 2020. A semi-automated iterative process for detecting feature interactions. In SBES '20: 34th Brazilian Symposium on Software Engineering, Natal, Brazil, October 19-23, 2020, Everton Cavalcante, Francisco Dantas, and Thaís Batista (Eds.). ACM, 778--787. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Larissa Rocha Soares, Pierre-Yves Schobbens, Ivan do Carmo Machado, and Eduardo Santana de Almeida. 2018. Feature interaction in software product line engineering: A systematic mapping study. Information and Software Technology 98 (2018), 44--58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Paul Temple, José A. Galindo, Mathieu Acher, and Jean-Marc Jézéquel. 2016. Using Machine Learning to Infer Constraints for Product Lines. In Proceedings of the 20th International Systems and Software Product Line Conference (SPLC '16). Association for Computing Machinery, New York, NY, USA, 209--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Gustavo Vale, Eduardo Fernandes, and Eduardo Figueiredo. 2019. On the proposal and evaluation of a benchmark-based threshold derivation method. Software Quality Journal 27, 1 (01 Mar 2019), 275--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Heping Zhang and Minghui Wang. 2009. Search for the smallest random forest. Statistics and its Interface 2, 3 (2009), 381.Google ScholarGoogle Scholar

Index Terms

  1. A machine learning model to classify the feature model maintainability

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SPLC '21: Proceedings of the 25th ACM International Systems and Software Product Line Conference - Volume A
          September 2021
          239 pages
          ISBN:9781450384698
          DOI:10.1145/3461001

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 September 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate167of463submissions,36%
        • Article Metrics

          • Downloads (Last 12 months)12
          • Downloads (Last 6 weeks)2

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader