skip to main content
10.1145/3555228.3555249acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
research-article

Machine Learning for Change-Prone Class Prediction: A History-Based Approach

Published: 05 October 2022 Publication History

Abstract

Classes have a very dynamic life cycle in object-oriented software projects. They can be created, modified or removed due to different reasons. The prediction of prone-change classes in the early stages of the project positively impact the team’s productivity, the allocation of resources, and the quality of the software developed. Existing work uses Machine Learning (ML) and different kind of class metrics. But a limitation of existing work that they do not consider the temporal dependency between instances in the datasets. To fulfill such gap, this work introduces an approach based on the change history of the class in different releases from public repositories. The approach uses the Sliding Window method, and adopts as predictors structural and evolutionary metrics, as well as frequency and diversity of smells. Five projects and four ML algorithms are used in the evaluation. In the great majority of the cases our approach overcomes a traditional approach considering all the indicators. Random Forest presents the best performance and the use of smell-related information does not impact the results.

References

[1]
Mojeeb Al-Khiaty, Radwan Abdel-Aal, and Mahmoud Elish. 2017. Abductive network ensembles for improved prediction of future change-prone classes in object-oriented software.International Arab Journal of Information Technology (IAJIT) 14, 6(2017).
[2]
E. Arisholm, L.C. Briand, and A. Foyen. 2004. Dynamic coupling measurement for object-oriented software. IEEE TSE 30, 8 (2004), 491–506.
[3]
J. Bansiya and C.G. Davis. 2002. A hierarchical model for object-oriented design quality assessment. IEEE TSE 28, 1 (2002), 4–17.
[4]
Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explor. Newsl. 6, 1 (jun 2004), 20–29.
[5]
Hans Christian Benestad, Bente Anda, and Erik Arisholm. 2010. Understanding Cost Drivers of Software Evolution: A Quantitative and Qualitative Investigation of Change Effort in Two Evolving Software Systems. Empirical Software Engineering 15, 2 (April 2010), 166–203.
[6]
J.M. Bieman, G. Straw, H. Wang, P.W. Munger, and R.T. Alexander. 2003. Design patterns and change proneness: an examination of five evolving systems. In Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717). 40–49.
[7]
W.J. Brown, R.C. Malveau, W.H. Brown, H.W. III McCormick, and T.J. Mowbray. 1999. Refactoring – Improving the Design of Existing Code. Addison-Wesley.
[8]
Francesco Caprio, Gerardo Casazza, MD Penta, and Umberto Villano. 2001. Measuring and predicting the Linux kernel evolution. In Proceedings of the International Workshop of Empirical Studies on Software Maintenance.
[9]
Gemma Catolino and Filomena Ferrucci. 2018. Ensemble techniques for software change prediction: A preliminary investigation. In 2018 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE). IEEE, 25–30.
[10]
Gemma Catolino and Filomena Ferrucci. 2019. An extensive evaluation of ensemble techniques for software change prediction. Journal of Software: Evolution and Process 31, 9 (2019), e2156.
[11]
Gemma Catolino, Fabio Palomba, Andrea De Lucia, Filomena Ferrucci, and Andy Zaidman. 2017. Developer-Related Factors in Change Prediction: An Empirical Assessment. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). 186–195.
[12]
Gemma Catolino, Fabio Palomba, Andrea De Lucia, Filomena Ferrucci, and Andy Zaidman. 2018. Enhancing change prediction models using developer-related factors. Journal of Systems and Software 143 (2018), 14–28.
[13]
Gemma Catolino, Fabio Palomba, Francesca Arcelli Fontana, Andrea De Lucia, Zaidman Andy, and Filomena Ferrucci. 2020. Improving change prediction models with code smell-related information. Empir. Software Eng. 25(2020), 49–95.
[14]
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16 (jun 2002), 321–357. https://doi.org/10.1613/jair.953
[15]
S.R. Chidamber and C.F. Kemerer. 1994. A metrics suite for object oriented design. IEEE TSE 20, 6 (1994), 476–493.
[16]
Thomas G. Dietterich. 2002. Machine Learning for Sequential Data: A Review. In Structural, Syntactic, and Statistical Pattern Recognition, Terry Caelli, Adnan Amin, Robert P. W. Duin, Dick de Ridder, and Mohamed Kamel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 15–30.
[17]
Thomas G. Dietterich. 2002. Machine Learning for Sequential Data: A Review. In Structural, Syntactic, and Statistical Pattern Recognition, Terry Caelli, Adnan Amin, Robert P. W. Duin, Dick de Ridder, and Mohamed Kamel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 15–30.
[18]
Mahmoud O Elish and Mojeeb Al-Rahman Al-Khiaty. 2013. A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J. of Software: Evolution and Process 25, 5 (2013), 407–437.
[19]
Sinan Eski and Feza Buzluca. 2011. An Empirical Study on Object-Oriented Metrics and Software Evolution in Order to Reduce Testing Costs by Predicting Change-Prone Classes. In 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops. 566–571. https://doi.org/10.1109/ICSTW.2011.43
[20]
Beat Fluri, Michael Wursch, Martin Pinzger, and Harald Gall. 2007. Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction. IEEE TSE 33, 11 (2007), 725–743.
[21]
M. Fowler. 1999. Refactoring – Improving the Design of Existing Code. Addison-Wesley.
[22]
Harald C Gall, Beat Fluri, and Martin Pinzger. 2009. Change analysis with Evolizer and ChangeDistiller. IEEE software 26, 1 (2009), 26–33.
[23]
Emanuel Giger, Martin Pinzger, and Harald C Gall. 2012. Can we predict types of code changes? an empirical analysis. In 2012 9th IEEE working conference on mining software repositories (MSR). IEEE, 217–226.
[24]
Deepa Godara and R.K. Singh. 2014. A Review of Studies on Change Proneness Prediction in Object Oriented Software. International Journal of Computer Applications 105, 3(2014), 0975–8887.
[25]
Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
[26]
K. Kaur and S. Jain. 2017. Evaluation of Machine Learning Approaches for Change-Proneness Prediction Using Code Smells. Advances in Intelligent Systems and Computing 515 (2017). https://doi.org/10.1007/978-981-10-3153-3_56
[27]
Megha Khanna, Srishti Priya, and Diksha Mehra. 2021. Software Change Prediction with Homogeneous Ensemble Learners on Large Scale Open-Source Systems. In 17th IFIP International Conference on Open Source Systems (OSS). Springer International Publishing, 68–86.
[28]
Foutse Khomh, Massimiliano Di Penta, and Yann-Gael Gueheneuc. 2009. An Exploratory Study of the Impact of Code Smells on Software Change-proneness. In 2009 16th Working Conference on Reverse Engineering. 75–84.
[29]
Foutse Khomh, Yann-Gaël Di Penta, Massimiliano an Guéhéneuc, and Giuliano Antonio. 2011. An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empir Software Eng 17(2011), 243–275.
[30]
A.G. Koru and J. Tian. 2005. Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products. IEEE TSE 31, 8 (2005), 625–642.
[31]
William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One-Criterion Variance Analysis. J. Amer. Statist. Assoc. 47, 260 (1952), 583–621.
[32]
Michele Lanza and Radu Marinescu. 2010. Object-Oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems. Springer.
[33]
Mikael Lindvall. 1998. Are large C++ classes change-prone? An empirical investigation. Journal of Software: Practice and Experience 28, 15(1998), 1551–1558.
[34]
Hongmin Lu, Yuming Zhou, Baowenu X, Hareton Leung, and Lin Chen. 2012. The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empirical Software Engineer 17 (2012), 200–242.
[35]
R. Malhotra and Bansal A.2015. Predicting change using software metrics: A review. In IEEE International Conference on Reliability, Infocom Technologies and Optimization (ICRITO). 1–6.
[36]
Ruchika Malhotra, Ritvik Kapoor, Deepti Aggarwal, and Priya Garg. 2021. Comparative Study of Feature Reduction Techniques in Software Change Prediction. 18–28. https://doi.org/10.1109/MSR52588.2021.00015
[37]
R. Malhotra and M. Khanna. 2013. Investigation of relationship between object-oriented metrics and change proneness. International Journal of Machine Learning and Cybernetics 4 (2013), 273–286. https://doi.org/10.1007/s13042-012-0095
[38]
Ruchika Malhotra and Megha Khanna. 2018. Particle swarm optimization-based ensemble learning for software change prediction. Information and Software Technology 102 (2018), 65–84.
[39]
Ruchika Malhotra and Megha Khanna. 2018. Prediction of Change Prone Classes Using Evolution-based and Object-oriented Metrics. Journal of Intelligent & Fuzzy Systems 34 (2018), 1755–1766. https://doi.org/10.3233/JIFS-169468
[40]
Ruchika Malhotra and Megha Khanna. 2019. Software Change Prediction: A Systematic Review and Future Guidelines. e-Informatica Software Engineering Journal 13, 1 (2019), 227–259.
[41]
Ruchika Malhotra and Megha Khanna. 2021. On the applicability of search-based algorithms for software change prediction. International Journal of Systems Assurance Engineering and Management (April 2021).
[42]
Ruchika Malhotra and Kusum Lata. 2020. An empirical study on predictability of software maintainability using imbalanced data. Software Quality Journal 28 (12 2020).
[43]
Antônio Diogo Forte Martins, Cristiano Sousa Melo, José Maria Monteiro, and Javam de Castro Machado. 2020. Empirical Study about Class Change Proneness Prediction using Software Metrics and Code Smells. In International Conference on Enterprise Information Systems (ICEIS). 140–147.
[44]
Massoud Massoudi, Nameet Kumar Jain, and Pranay Bansal. 2021. Software Defect Prediction using Dimensionality Reduction and Deep Learning. In 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV). 884–893.
[45]
Cristiano Sousa Melo, Matheus Mayron Lima da Cruz, Antônio Diogo Forte Martins, José Maria da Silva Monteiro Filho, and Javam de Castro Machado. 2020. Time-series Approaches to Change-prone Class Prediction Problem. In International Conference on Enterprise Information Systems (ICEIS). 122–132.
[46]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[47]
Nakul Pritam, Manju Khari, Le Hoang Son, Raghvendra Kumar, Sudan Jha, Ishaani Priyadarshini, Mohamed Abdel-Basset, and Hoang Viet Long. 2019. Assessment of Code Smell for Predicting Class Change Proneness Using Machine Learning. IEEE Access 7(2019), 37414–37425.
[48]
Denise Rey and Markus Neuhäuser. 2011. Wilcoxon-Signed-Rank Test. Springer Berlin Heidelberg, Berlin, Heidelberg, 1658–1659.
[49]
Daniele Romano and Martin Pinzger. 2011. Using source code metrics to predict change-prone Java interfaces. In 2011 27th IEEE International Conference on Software Maintenance (ICSM). 303–312. https://doi.org/10.1109/ICSM.2011.6080797
[50]
Rogerio Silva, Paulo Roberto Farah, and Silvia Regina Vergilio. 2022. Supplementary Material - Machine Learning for Change-Prone Class Prediction: A History-Based Approach. URL https://github.com/carvalho7976/ChangeProneTools.
[51]
N. Tsantalis, A. Chatzigeorgiou, and G. Stephanides. 2005. Predicting the probability of change in object-oriented systems. IEEE TSE 31, 7 (2005), 601–614.
[52]
Dimitrios Tsoukalas, Dionysios Kehagias, Miltiadis Siavvas, and Alexander Chatzigeorgiou. 2020. Technical debt forecasting: An empirical study on open-source repositories. Journal of Systems and Software 170 (2020), 110777.
[53]
Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Bjöorn Regnell, and Anders Wesslén. 2000. Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers.
[54]
Yuming Zhou, Hareton Leung, and Baowen Xu. 2009. Examining the Potentially Confounding Effect of Class Size on the Associations between Object-Oriented Metrics and Change-Proneness. IEEE TSE 35, 5 (2009), 607–623.
[55]
Xiaoyan Zhu, Yueyang He, Long Cheng, Xiaolin Jia, and Lei Zhu. 2018. Software change-proneness prediction through combination of bagging and resampling methods. Journal of Software: Evolution and Process 30, 12 (2018), e2111.

Cited By

View all

Index Terms

  1. Machine Learning for Change-Prone Class Prediction: A History-Based Approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SBES '22: Proceedings of the XXXVI Brazilian Symposium on Software Engineering
    October 2022
    457 pages
    ISBN:9781450397353
    DOI:10.1145/3555228
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. class change proneness
    2. machine learning
    3. temporal dependency

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • CAPES and CNPq Brazil

    Conference

    SBES 2022
    SBES 2022: XXXVI Brazilian Symposium on Software Engineering
    October 5 - 7, 2022
    Virtual Event, Brazil

    Acceptance Rates

    Overall Acceptance Rate 147 of 427 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 73
      Total Downloads
    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media