research-article

An Assessment of Machine Learning Algorithms and Models for Prediction of Change-Prone Java Methods

Authors:

Paulo Roberto Farah,

Rogério Silva,

Silvia VergilioAuthors Info & Claims

SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering

Pages 322 - 331

https://doi.org/10.1145/3613372.3613395

Published: 25 September 2023 Publication History

Abstract

Identifying which parts of code are prone to change during software evolution allows developers to prioritize and allocate resources efficiently. Having as focus a smaller scope makes easier change management and allows monitoring the type of modification and its impact. However, existing change-proneness prediction approaches are focused mainly on system classes. But the problem is that classes contain many characteristics of different software attributes and some software behaviors are more granular and better captured at the method-level. Motivated by these facts, in this paper, we empirically assess the performance of four machine learning algorithms for change-prone method prediction in seven open-source software projects. We derived and compared models obtained with three sets of independent variables (features): a set composed of structural metrics, a second set composed of evolution-based metrics, and a third that includes a combination of both kinds of metrics. The results show that, Random Forest presents the best general performance, independently of the used indicator and set of features. The model composed by both sets of metrics outperforms the other two. Two features based on the frequency of changes that happened in the evolution history of the method are point out as the most important for our problem.

References

[1]

Mojeeb Al-Khiaty, Radwan Abdel-Aal, and Mahmoud Elish. 2017. Abductive network ensembles for improved prediction of future change-prone classes in object-oriented software.Int. Arab Journal of Inf. Techn. (IAJIT) 14, 6 (2017).

[2]

Hirohisa Aman, Sousuke Amasaki, Takashi Sasaki, and Minoru Kawahara. 2015. Empirical Analysis of Change-Proneness in Methods Having Local Variables with Long Names and Comments. In 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (Beijing, China). 1–4.

[3]

Maurício Aniche. 2015. Java code metrics calculator (CK). Available in https://github.com/mauricioaniche/ck/.

[4]

Maurício Aniche, Erick Maziero, Rafael Durelli, and Vinicius H. S. Durelli. 2022. The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring. IEEE TSE 48, 4 (2022), 1432–1450.

[5]

Jagdish Bansiya and Carl G. Davis. 2002. A hierarchical model for object-oriented design quality assessment. IEEE TSE 28, 1 (2002), 4–17.

Digital Library

[6]

Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explor. Newsl. 6, 1 (jun 2004), 20–29.

Digital Library

[7]

Aji Ery Burhandenny, Hirohisa Aman, and Minoru Kawahara. 2017. Change-Prone Java Method Prediction by Focusing on Individual Differences in Comment Density. IEICE Trans. on Information and Systems 100, 5 (2017), 1128–1131.

[8]

Gemma Catolino and Filomena Ferrucci. 2018. Ensemble techniques for software change prediction: A preliminary investigation. In 2018 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE). IEEE, 25–30.

[9]

Gemma Catolino and Filomena Ferrucci. 2019. An extensive evaluation of ensemble techniques for software change prediction. Journal of Software: Evolution and Process 31, 9 (2019), e2156.

Digital Library

[10]

Gemma Catolino, Fabio Palomba, Andrea De Lucia, Filomena Ferrucci, and Andy Zaidman. 2018. Enhancing change prediction models using developer-related factors. Journal of Systems and Software 143 (2018), 14–28.

[11]

Gemma Catolino, Fabio Palomba, Francesca Arcelli Fontana, Andrea De Lucia, Zaidman Andy, and Filomena Ferrucci. 2020. Improving change prediction models with code smell-related information. Empir. Software Eng. 25 (2020), 49–95.

Digital Library

[12]

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16 (jun 2002), 321–357. https://doi.org/10.1613/jair.953

[13]

S.R. Chidamber and C.F. Kemerer. 1994. A metrics suite for object oriented design. IEEE TSE 20, 6 (1994), 476–493.

Digital Library

[14]

Georg Dotzler, Marius Kamp, Patrick Kreutzer, and Michael Philippsen. 2017. More Accurate Recommendations for Method-Level Changes. In Proceedings of the 11th FSE(ESEC/FSE 2017). ACM, Paderborn, Germany, 798–808.

Digital Library

[15]

Mahmoud O Elish and Mojeeb Al-Rahman Al-Khiaty. 2013. A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J. of Software: Evolution and Process 25, 5 (2013), 407–437.

[16]

Paulo Roberto Farah, Rogerio Silva, and Silvia Regina Vergilio. 2023. Supplementary Material. https://github.com/paulorfarah/SBES2023.

[17]

Beat Fluri, Michael Wursch, Martin Pinzger, and Harald Gall. 2007. Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction. IEEE TSE 33, 11 (2007), 725–743.

Digital Library

[18]

Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (Hong Kong, China). 1322–1328.

[19]

K. Kaur and S. Jain. 2017. Evaluation of Machine Learning Approaches for Change-Proneness Prediction Using Code Smells. Advances in Intelligent Systems and Computing 515 (2017). https://doi.org/10.1007/978-981-10-3153-3_56

[20]

Yasser Khan, Mahmoud Elish, and Mohamed El-Attar. 2012. A systematic review on the impact of CK metrics on the functional correctness of object-oriented classes. In Int. Conf. on Computational Science and Its Applications. Springer.

Digital Library

[21]

Dong Kwan Kim. 2017. Finding bad code smells with neural network models. International Journal of Electrical and Computer Engineering 7, 6 (2017), 3613.

[22]

Miryung Kim and David Notkin. 2009. Discovering and representing systematic code changes. In IEEE ICSE (Vancouver, Canada). 309–319.

[23]

Eijirou Kitsu, Takayuki Omori, and Katsuhisa Maruyama. 2013. Detecting Program Changes from Edit History of Source Code. In 2013 20th Asia-Pacific Software Engineering Conference (APSEC) (Bangkok, Thailand), Vol. 1. 299–306.

Digital Library

[24]

Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.

Digital Library

[25]

Ruchika Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing 27 (2015), 504–518.

Digital Library

[26]

Ruchika Malhotra, Ritvik Kapoor, Deepti Aggarwal, and Priya Garg. 2021. Comparative Study of Feature Reduction Techniques in Software Change Prediction. In Conference on Mining Software Repositories (MSR) (Virtual). 18–28.

[27]

Ruchika Malhotra and Megha Khanna. 2018. Particle swarm optimization-based ensemble learning for software change prediction. IST 102 (2018), 65–84.

[28]

Ruchika Malhotra and Megha Khanna. 2018. Prediction of Change Prone Classes Using Evolution-based and Object-oriented Metrics. Journal of Intelligent & Fuzzy Systems 34 (2018), 1755–1766.

Digital Library

[29]

Ruchika Malhotra and Megha Khanna. 2019. Software Change Prediction: A Systematic Review and Future Guidelines. EISEJ 13, 1 (2019), 227–259.

[30]

Ruchika Malhotra and Kusum Lata. 2020. An empirical study on predictability of software maintainability using imbalanced data. Software Quality Journal (2020).

[31]

Na Meng, Miryung Kim, and Kathryn S. McKinley. 2013. Lase: Locating and applying systematic edits by learning from examples. In ICSE (CA, USA). 502–511.

[32]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.

Digital Library

[33]

N. Pritam, M. Khari, L. Hoang Son, R. Kumar, S. Jha, I. Priyadarshini, M. Abdel-Basset, and H. Viet Long. 2019. Assessment of Code Smell for Predicting Class Change Proneness Using Machine Learning. IEEE Access 7 (2019).

[34]

Jacek Ratzinger, Thomas Sigmund, Peter Vorburger, and Harald Gall. 2007. Mining Software Evolution to Predict Refactoring. In ESEM (Madrid). 354–363.

[35]

Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In 39th ICSE (Argentina). 404–415.

[36]

Daniele Romano and Martin Pinzger. 2011. Using source code metrics to predict change-prone Java interfaces. In 27th IEEE ICSM (VA, USA). 303–312.

[37]

Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python framework for mining software repositories. In ESEC/FSE 2018 (USA). 908–911.

Index Terms

An Assessment of Machine Learning Algorithms and Models for Prediction of Change-Prone Java Methods
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Maintaining software

Recommendations

Applying machine learning to software fault-proneness prediction

The importance of software testing to quality assurance cannot be overemphasized. The estimation of a module's fault-proneness is important for minimizing cost and improving the effectiveness of the software testing process. Unfortunately, no general ...
Using software metrics for predicting vulnerable classes and methods in Java projects: A machine learning approach
Abstract
[Context]A software vulnerability becomes harmful for software when an attacker successfully exploits the insecure code and reveals the vulnerability. A single vulnerability in code can put the entire software at risk. Therefore, maintaining ...

This paper proposes and empirically evaluates suite of software metrics that can be used as feature set to predict vulnerable code‐components at two levels of granularity: Java class‐level and method‐level. Software development teams can use the proposed ...
New internal metric for software clustering algorithms validity

Clustering (modularisation) techniques are often employed for the meaningful decomposition of a program aiming to understand it. In the software clustering context, several external metrics are presented to evaluate and validate the resultant clustering ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering

September 2023

570 pages

ISBN:9798400707872

DOI:10.1145/3613372

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SBES 2023

SBES 2023: XXXVII Brazilian Symposium on Software Engineering

September 25 - 29, 2023

Campo Grande, Brazil

Acceptance Rates

Overall Acceptance Rate 147 of 427 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
36
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten