ABSTRACT
Software Product Lines (SPL) are generally specified using a Feature Model (FM), an artifact designed in the early stages of the SPL development life cycle. This artifact can quickly become too complex, which makes it challenging to maintain an SPL. Therefore, it is essential to evaluate the artifact's maintainability continuously. The literature brings some approaches that evaluate FM maintainability through the aggregation of maintainability measures. Machine Learning (ML) models can be used to create these approaches. They can aggregate the values of independent variables into a single target data, also called a dependent variable. Besides, when using white-box ML models, it is possible to interpret and explain the ML model results. This work proposes white-box ML models intending to classify the FM maintainability based on 15 measures. To build the models, we performed the following steps: (i) we compared two approaches to evaluate the FM maintainability through a human-based oracle of FM maintainability classifications; (ii) we used the best approach to pre-classify the ML training dataset; (iii) we generated three ML models and compared them against classification accuracy, precision, recall, F1 and AUC-ROC; and, (iv) we used the best model to create a mechanism capable of providing improvement indicators to domain engineers. The best model used the decision tree algorithm that obtained accuracy, precision, and recall of 0.81, F1-Score of 0.79, and AUC-ROC of 0.91. Using this model, we could reduce the number of measures needed to evaluate the FM maintainability from 15 to 9 measures.
- Mathieu Acher, Benoit Baudry, Patrick Heymans, Anthony Cleve, and Jean-Luc Hainaut. 2013. Support for Reverse Engineering and Maintaining Feature Models. In Proceedings of the Seventh International Workshop on Variability Modelling of Software-Intensive Systems (VaMoS '13). Association for Computing Machinery, New York, NY, USA, Article 20, 8 pages. Google ScholarDigital Library
- Mohamed Alloghani, Dhiya Al-Jumeily, Jamila Mustafina, Abir Hussain, and Ahmed J. Aljaaf. 2020. A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science. Springer, Cham, 3--21.Google Scholar
- Ethem Alpaydin. 2020. Introduction to machine learning. MIT press. Google ScholarDigital Library
- Hadeel Alsolai and Marc Roper. 2020. A systematic literature review of machine learning techniques for software maintainability prediction. Information and Software Technology 119 (2020), 106214. Google ScholarCross Ref
- Sven Apel, Don S. Batory, Christian Kästner, and Gunter Saake. 2013. Feature-Oriented Software Product Lines - Concepts and Implementation. Springer. Google ScholarDigital Library
- Ebrahim Bagheri and Dragan Gasevic. 2011. Assessing the maintainability of software product line feature models using structural metrics. Software Quality Journal 19, 3 (2011), 579--612. Google ScholarDigital Library
- Gabriel Bailey, Allison Joffrion, and Megan Pearson. 2018. A comparison of machine learning applications across professional sectors. Available at SSRN 3174123 (2018).Google Scholar
- Don Batory. 2005. Feature Models, Grammars, and Propositional Formulas. In Software Product Lines, Henk Obbink and Klaus Pohl (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 7--20. Google ScholarDigital Library
- Clément Bénard, Gérard Biau, Sébastien Veiga, and Erwan Scornet. 2021. Interpretable random forests via rule extraction. In International Conference on Artificial Intelligence and Statistics. PMLR, 937--945.Google Scholar
- Thorsten Berger and Jianmei Guo. 2014. Towards System Analysis with Variability Model Metrics. In Proceedings of the Eighth International Workshop on Variability Modelling of Software-Intensive Systems (VaMoS '14). Association for Computing Machinery, New York, NY, USA, Article 23, 8 pages. Google ScholarDigital Library
- Thorsten Berger and Jianmei Guo. 2014. Towards system analysis with variability model metrics. In Proceedings of the Eighth International Workshop on Variability Modelling of Software-Intensive Systems. 1--8. Google ScholarDigital Library
- Carla I.M. Bezerra, Rossana M.C. Andrade, and Jose Maria Monteiro. 2017. Exploring quality measures for the evaluation of feature models: a case study. Journal of Systems and Software 131 (2017), 366--385.Google ScholarCross Ref
- Carla I. M. Bezerra, Rossana M. C. Andrade, and José Maria S. Monteiro. 2014. Measures for Quality Evaluation of Feature Models. In Software Reuse for Dynamic Systems in the Cloud and Beyond, Ina Schaefer and Ioannis Stamelos (Eds.). Springer International Publishing, Cham, 282--297.Google Scholar
- Carla I. M. Bezerra, Jefferson Barbosa, Joao Holanda Freires, Rossana M. C. Andrade, and José Maria Monteiro. 2016. DyMMer: A Measurement-Based Tool to Support Quality Evaluation of DSPL Feature Models. In Proceedings of the 20th International Systems and Software Product Line Conference (SPLC). ACM. Google ScholarDigital Library
- Carla I. M. Bezerra, José Maria Monteiro, Rossana M. C. Andrade, and Lincoln S. Rocha. 2016. Analyzing the Feature Models Maintainability over Their Evolution Process: An Exploratory Study. In Proceedings of the Tenth International Workshop on Variability Modelling of Software-Intensive Systems (VaMoS). ACM. Google ScholarDigital Library
- Giuseppe Bonaccorso. 2017. Machine learning algorithms. Packt Publishing Ltd.Google Scholar
- Michael W Browne. 2000. Cross-validation methods. Journal of Mathematical Psychology 44, 1 (2000), 108--132. Google ScholarDigital Library
- Johannes Bürdek, Timo Kehrer, Malte Lochau, Dennis Reuling, Udo Kelter, and Andy Schürr. 2016. Reasoning about product-line evolution using complex feature model differences. Automated Software Engineering 23, 4 (01 Dec 2016), 687--733. Google ScholarDigital Library
- Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 1 (02 Jan 2020).Google Scholar
- Paul Clements and Linda Northrop. 2002. Software product lines. Addison-Wesley Boston.Google Scholar
- Davi Cedraz S. de Oliveira and Carla I. M. Bezerra. 2019. Development of the Maintainability Index for SPLs Feature Models Using Fuzzy Logic. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering (SBES). ACM, New York, NY, USA. Google ScholarDigital Library
- Issam El Naqa and Martin J Murphy. 2015. What is machine learning? In machine learning in radiation oncology. Springer, 3--11.Google Scholar
- Sascha El-Sharkawy, Adam Krafczyk, and Klaus Schmid. 2019. MetricHaven: More than 23,000 Metrics for Measuring Quality Attributes of Software Product Lines. In Proceedings of the 23rd International Systems and Software Product Line Conference - Volume B (SPLC '19). Association for Computing Machinery, New York, NY, USA, 25--28. Google ScholarDigital Library
- Sascha El-Sharkawy, Adam Krafczyk, and Klaus Schmid. 2020. Fast Static Analyses of Software Product Lines: An Example with More than 42,000 Metrics. In Proceedings of the 14th International Working Conference on Variability Modelling of Software-Intensive Systems (VAMOS '20). Association for Computing Machinery, New York, NY, USA, Article 8, 9 pages. Google ScholarDigital Library
- Sascha El-Sharkawy, Nozomi Yamagishi-Eichler, and Klaus Schmid. 2019. Metrics for analyzing variability and its implementation in software product lines: A systematic literature review. Information and Software Technology 106 (2019), 1--30. Google ScholarCross Ref
- Brandon M Greenwell, B Boehmke, and B Gray. 2020. Variable importance plots---An introduction to the vip package. The R Journal 12, 1 (2020), 343--366.Google ScholarCross Ref
- Anil K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8 (2010), 651--666. Award winning papers from the 19th International Conference on Pattern Recognition (ICPR). Google ScholarDigital Library
- Sudan Jha, Raghvendra Kumar, Le Hoang Son, Mohamed Abdel-Basset, Ishaani Priyadarshini, Rohit Sharma, and Hoang Viet Long. 2019. Deep Learning Approach for Software Maintainability Metrics Prediction. IEEE Access 7 (2019), 61840--61855.Google ScholarCross Ref
- Kyo C Kang, Sholom G Cohen, James A Hess, William E Novak, and A Spencer Peterson. 1990. Feature-oriented domain analysis (FODA) feasibility study. Technical Report. Carnegie-Mellon Univ Pittsburgh Pa Software Engineering Inst.Google Scholar
- George Klir and Bo Yuan. 1995. Fuzzy sets and fuzzy logic. Vol. 4. Prentice hall New Jersey.Google Scholar
- Luan Lima, Anderson Uchôa, Carla Bezerra, Emanuel Coutinho, and Lincoln Rocha. 2020. Visualizing the Maintainability of Feature Models in SPLs. In Anais do VIII Workshop de Visualização, Evolução e Manutenção de Software. SBC, 1--8.Google Scholar
- Octavio Loyola-Gonzalez. 2019. Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access 7 (2019), 154096--154113.Google ScholarCross Ref
- Yuxin Ma, Wei Chen, Xiaohong Ma, Jiayi Xu, Xinxin Huang, Ross Maciejewski, and Anthony KH Tung. 2017. EasySVM: A visual analysis approach for open-box support vector machines. Computational Visual Media 3, 2 (2017), 161--175.Google ScholarCross Ref
- Valerio Maggio. 2013. Improving Software Maintenance using Unsupervised Machine Learning techniques. Ph.D. Dissertation. University of Naples Federico II, Italy. http://www.fedoa.unina.it/9079/Google Scholar
- Maíra Marques, Jocelyn Simmonds, Pedro O. Rossel, and María Cecilia Bastarrica. 2019. Software product line evolution: A systematic literature review. Information and Software Technology 105 (2019), 190--208.Google ScholarCross Ref
- Stephen Marsland. 2015. Machine learning. CRC press.Google Scholar
- Marcilio Mendonca, Moises Branco, and Donald Cowan. 2009. S.P.L.O.T.: Software Product Lines Online Tools. In Proceedings of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications (OOPSLA). ACM. Google ScholarDigital Library
- Sonia Montagud, Silvia Abrahão, and Emilio Insfran. 2012. A systematic review of quality attributes and measures for software product lines. Software Quality Journal 20, 3 (2012), 425--486. Google ScholarDigital Library
- Sarang Narkhede. 2018. Understanding auc-roc curve. Towards Data Science 26 (2018), 220--227.Google Scholar
- Leonardo Passos, Krzysztof Czarnecki, Sven Apel, Andrzej Wąsowski, Christian Kästner, and Jianmei Guo. 2013. Feature-Oriented Software Evolution. In Proceedings of the Seventh International Workshop on Variability Modelling of Software-Intensive Systems (VaMoS). ACM. Google ScholarDigital Library
- Neil J Salkind and Terese Rainwater. 2006. Exploring research. Pearson Prentice Hall Upper Saddle River, NJ.Google Scholar
- Patrick Schober, Christa Boer, and Lothar A Schwarte. 2018. Correlation coefficients: appropriate use and interpretation. Anesthesia & Analgesia 126, 5 (2018), 1763--1768.Google ScholarCross Ref
- Pratap Chandra Sen, Mahimarnab Hajra, and Mitadru Ghosh. 2020. Supervised Classification Algorithms in Machine Learning: A Survey and Review. In Emerging Technology in Modelling and Graphics, Jyotsna Kumar Mandal and Debika Bhattacharya (Eds.). Springer Singapore, Singapore, 99--111.Google Scholar
- Publio Silva, Carla I. M. Bezerra, Rafael Lima, and Ivan Machado. 2020. Classifying Feature Models Maintainability Based on Machine Learning Algorithms. In Proceedings of the 14th Brazilian Symposium on Software Components, Architectures, and Reuse (SBCARS). ACM. Google ScholarDigital Library
- Larissa Rocha Soares, Ivan Machado, Eduardo Santana de Almeida, Christian Kästner, and Sarah Nadi. 2020. A semi-automated iterative process for detecting feature interactions. In SBES '20: 34th Brazilian Symposium on Software Engineering, Natal, Brazil, October 19-23, 2020, Everton Cavalcante, Francisco Dantas, and Thaís Batista (Eds.). ACM, 778--787. Google ScholarDigital Library
- Larissa Rocha Soares, Pierre-Yves Schobbens, Ivan do Carmo Machado, and Eduardo Santana de Almeida. 2018. Feature interaction in software product line engineering: A systematic mapping study. Information and Software Technology 98 (2018), 44--58.Google ScholarDigital Library
- Paul Temple, José A. Galindo, Mathieu Acher, and Jean-Marc Jézéquel. 2016. Using Machine Learning to Infer Constraints for Product Lines. In Proceedings of the 20th International Systems and Software Product Line Conference (SPLC '16). Association for Computing Machinery, New York, NY, USA, 209--218. Google ScholarDigital Library
- Gustavo Vale, Eduardo Fernandes, and Eduardo Figueiredo. 2019. On the proposal and evaluation of a benchmark-based threshold derivation method. Software Quality Journal 27, 1 (01 Mar 2019), 275--306. Google ScholarDigital Library
- Heping Zhang and Minghui Wang. 2009. Search for the smallest random forest. Statistics and its Interface 2, 3 (2009), 381.Google Scholar
Index Terms
- A machine learning model to classify the feature model maintainability
Recommendations
Classifying Feature Models Maintainability based on Machine Learning Algorithms
SBCARS '20: Proceedings of the 14th Brazilian Symposium on Software Components, Architectures, and ReuseMaintenance in the context of SPLs is a topic of interest, and that still needs further investigation. There are several ways to evaluate the maintainability of a feature model (FM), one of which is a manual or automated analysis of quality measures. ...
Automating Feature Model maintainability evaluation using machine learning techniques
Abstract Context:Software Product Lines (SPL) are generally specified using a Feature Model (FM), an artifact designed in the early stages of the SPL development life cycle. This artifact can quickly become too complex, which makes ...
Highlights- The study compares two FM maintainability classification approaches described in the literature.
Construction of a quality model for machine learning systems
AbstractNowadays, systems containing components based on machine learning (ML) methods are becoming more widespread. In order to ensure the intended behavior of a software system, there are standards that define necessary qualities of the system and its ...
Comments