skip to main content
10.1145/3613372.3613395acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
research-article

An Assessment of Machine Learning Algorithms and Models for Prediction of Change-Prone Java Methods

Published: 25 September 2023 Publication History

Abstract

Identifying which parts of code are prone to change during software evolution allows developers to prioritize and allocate resources efficiently. Having as focus a smaller scope makes easier change management and allows monitoring the type of modification and its impact. However, existing change-proneness prediction approaches are focused mainly on system classes. But the problem is that classes contain many characteristics of different software attributes and some software behaviors are more granular and better captured at the method-level. Motivated by these facts, in this paper, we empirically assess the performance of four machine learning algorithms for change-prone method prediction in seven open-source software projects. We derived and compared models obtained with three sets of independent variables (features): a set composed of structural metrics, a second set composed of evolution-based metrics, and a third that includes a combination of both kinds of metrics. The results show that, Random Forest presents the best general performance, independently of the used indicator and set of features. The model composed by both sets of metrics outperforms the other two. Two features based on the frequency of changes that happened in the evolution history of the method are point out as the most important for our problem.

References

[1]
Mojeeb Al-Khiaty, Radwan Abdel-Aal, and Mahmoud Elish. 2017. Abductive network ensembles for improved prediction of future change-prone classes in object-oriented software.Int. Arab Journal of Inf. Techn. (IAJIT) 14, 6 (2017).
[2]
Hirohisa Aman, Sousuke Amasaki, Takashi Sasaki, and Minoru Kawahara. 2015. Empirical Analysis of Change-Proneness in Methods Having Local Variables with Long Names and Comments. In 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (Beijing, China). 1–4.
[3]
Maurício Aniche. 2015. Java code metrics calculator (CK). Available in https://github.com/mauricioaniche/ck/.
[4]
Maurício Aniche, Erick Maziero, Rafael Durelli, and Vinicius H. S. Durelli. 2022. The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring. IEEE TSE 48, 4 (2022), 1432–1450.
[5]
Jagdish Bansiya and Carl G. Davis. 2002. A hierarchical model for object-oriented design quality assessment. IEEE TSE 28, 1 (2002), 4–17.
[6]
Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explor. Newsl. 6, 1 (jun 2004), 20–29.
[7]
Aji Ery Burhandenny, Hirohisa Aman, and Minoru Kawahara. 2017. Change-Prone Java Method Prediction by Focusing on Individual Differences in Comment Density. IEICE Trans. on Information and Systems 100, 5 (2017), 1128–1131.
[8]
Gemma Catolino and Filomena Ferrucci. 2018. Ensemble techniques for software change prediction: A preliminary investigation. In 2018 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE). IEEE, 25–30.
[9]
Gemma Catolino and Filomena Ferrucci. 2019. An extensive evaluation of ensemble techniques for software change prediction. Journal of Software: Evolution and Process 31, 9 (2019), e2156.
[10]
Gemma Catolino, Fabio Palomba, Andrea De Lucia, Filomena Ferrucci, and Andy Zaidman. 2018. Enhancing change prediction models using developer-related factors. Journal of Systems and Software 143 (2018), 14–28.
[11]
Gemma Catolino, Fabio Palomba, Francesca Arcelli Fontana, Andrea De Lucia, Zaidman Andy, and Filomena Ferrucci. 2020. Improving change prediction models with code smell-related information. Empir. Software Eng. 25 (2020), 49–95.
[12]
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16 (jun 2002), 321–357. https://doi.org/10.1613/jair.953
[13]
S.R. Chidamber and C.F. Kemerer. 1994. A metrics suite for object oriented design. IEEE TSE 20, 6 (1994), 476–493.
[14]
Georg Dotzler, Marius Kamp, Patrick Kreutzer, and Michael Philippsen. 2017. More Accurate Recommendations for Method-Level Changes. In Proceedings of the 11th FSE(ESEC/FSE 2017). ACM, Paderborn, Germany, 798–808.
[15]
Mahmoud O Elish and Mojeeb Al-Rahman Al-Khiaty. 2013. A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J. of Software: Evolution and Process 25, 5 (2013), 407–437.
[16]
Paulo Roberto Farah, Rogerio Silva, and Silvia Regina Vergilio. 2023. Supplementary Material. https://github.com/paulorfarah/SBES2023.
[17]
Beat Fluri, Michael Wursch, Martin Pinzger, and Harald Gall. 2007. Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction. IEEE TSE 33, 11 (2007), 725–743.
[18]
Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (Hong Kong, China). 1322–1328.
[19]
K. Kaur and S. Jain. 2017. Evaluation of Machine Learning Approaches for Change-Proneness Prediction Using Code Smells. Advances in Intelligent Systems and Computing 515 (2017). https://doi.org/10.1007/978-981-10-3153-3_56
[20]
Yasser Khan, Mahmoud Elish, and Mohamed El-Attar. 2012. A systematic review on the impact of CK metrics on the functional correctness of object-oriented classes. In Int. Conf. on Computational Science and Its Applications. Springer.
[21]
Dong Kwan Kim. 2017. Finding bad code smells with neural network models. International Journal of Electrical and Computer Engineering 7, 6 (2017), 3613.
[22]
Miryung Kim and David Notkin. 2009. Discovering and representing systematic code changes. In IEEE ICSE (Vancouver, Canada). 309–319.
[23]
Eijirou Kitsu, Takayuki Omori, and Katsuhisa Maruyama. 2013. Detecting Program Changes from Edit History of Source Code. In 2013 20th Asia-Pacific Software Engineering Conference (APSEC) (Bangkok, Thailand), Vol. 1. 299–306.
[24]
Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.
[25]
Ruchika Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing 27 (2015), 504–518.
[26]
Ruchika Malhotra, Ritvik Kapoor, Deepti Aggarwal, and Priya Garg. 2021. Comparative Study of Feature Reduction Techniques in Software Change Prediction. In Conference on Mining Software Repositories (MSR) (Virtual). 18–28.
[27]
Ruchika Malhotra and Megha Khanna. 2018. Particle swarm optimization-based ensemble learning for software change prediction. IST 102 (2018), 65–84.
[28]
Ruchika Malhotra and Megha Khanna. 2018. Prediction of Change Prone Classes Using Evolution-based and Object-oriented Metrics. Journal of Intelligent & Fuzzy Systems 34 (2018), 1755–1766.
[29]
Ruchika Malhotra and Megha Khanna. 2019. Software Change Prediction: A Systematic Review and Future Guidelines. EISEJ 13, 1 (2019), 227–259.
[30]
Ruchika Malhotra and Kusum Lata. 2020. An empirical study on predictability of software maintainability using imbalanced data. Software Quality Journal (2020).
[31]
Na Meng, Miryung Kim, and Kathryn S. McKinley. 2013. Lase: Locating and applying systematic edits by learning from examples. In ICSE (CA, USA). 502–511.
[32]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[33]
N. Pritam, M. Khari, L. Hoang Son, R. Kumar, S. Jha, I. Priyadarshini, M. Abdel-Basset, and H. Viet Long. 2019. Assessment of Code Smell for Predicting Class Change Proneness Using Machine Learning. IEEE Access 7 (2019).
[34]
Jacek Ratzinger, Thomas Sigmund, Peter Vorburger, and Harald Gall. 2007. Mining Software Evolution to Predict Refactoring. In ESEM (Madrid). 354–363.
[35]
Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In 39th ICSE (Argentina). 404–415.
[36]
Daniele Romano and Martin Pinzger. 2011. Using source code metrics to predict change-prone Java interfaces. In 27th IEEE ICSM (VA, USA). 303–312.
[37]
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python framework for mining software repositories. In ESEC/FSE 2018 (USA). 908–911.

Index Terms

  1. An Assessment of Machine Learning Algorithms and Models for Prediction of Change-Prone Java Methods

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering
    September 2023
    570 pages
    ISBN:9798400707872
    DOI:10.1145/3613372
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 September 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Software maintenance
    2. machine learning
    3. software metrics

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SBES 2023
    SBES 2023: XXXVII Brazilian Symposium on Software Engineering
    September 25 - 29, 2023
    Campo Grande, Brazil

    Acceptance Rates

    Overall Acceptance Rate 147 of 427 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 36
      Total Downloads
    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media