Severity Classification of Code Smells Using Machine-Learning Methods

Dewangan, Seema; Rao, Rajwant Singh; Chowdhuri, Sripriya Roy; Gupta, Manjari

doi:10.1007/s42979-023-01979-8

Severity Classification of Code Smells Using Machine-Learning Methods

Original Research
Published: 29 July 2023

Volume 4, article number 564, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Seema Dewangan¹,
Rajwant Singh Rao ORCID: orcid.org/0000-0001-6993-8927¹,
Sripriya Roy Chowdhuri² &
…
Manjari Gupta²

217 Accesses
4 Citations
Explore all metrics

Abstract

Code smell detection can be very useful for minimizing maintenance costs and improving software quality. Code smells help developers/programmers, researchers to subjectively interpret design defects in different ways. Code smells instances can have varied size, intensity or severity which needs to be focused upon as they affect the software quality accordingly. Therefore, this study aims to detect the severity of code smells from code smell datasets. The severity of code smells is significant for reporting code smell detection performance, as it permits refactoring efforts to be prioritized. Code smell severity also describes extent of effort required during software maintenance. In our work, we have considered four code smells severity datasets to detect the severity of code smell. These datasets are data class, god class, feature envy and long method code smells. This paper uses four machine-learning and three ensemble learning approaches to identify the severity of code smells. To improve the models’ performance, we used fivefold cross-validation method: Chi-square-based feature selection algorithm and parameter optimization techniques. We applied two-parameter optimization techniques, namely grid search and random search and also compared their accuracy. The conclusion of this study is that the XG Boost model obtained an accuracy of 99.12%, using the Chi-square-based feature selection technique for the long method code smell dataset. In this study, the results show that ensemble learning is best as compared to machine learning for severity detection of code smells.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review

Article 07 January 2020

Method-Level Code Smells Detection Using Machine Learning Models

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Article 30 November 2020

Data availability

The dataset can be found at http://essere.disco.unimib.it/reverse/MLCSD.html.

References

Lehman MM. Programs, life cycles, and laws of software evolution. Proc IEEE. 1980;68(9):1060–76.
Article Google Scholar
Wiegers K, Beatty J. Software Requirements. London: Pearson Education; 2013.
Google Scholar
Chung L, Do PLJCS. On non-functional requirements in software engineering. In: Borgida AT, Chaudhri V, Giorgini P, Yue ES, editors. conceptual modeling: foundations and applications (lecture notes in computer science). Cham: Springer; 2009. p. 363–79.
Chapter Google Scholar
Fontana FA, Zanoni M. Code smell severity classification using machine learning techniques. Knowl Based Syst. 2017. https://doi.org/10.1016/j.knosys.2017.04.014.
Article Google Scholar
Vidal SA, Marcos C, Dıaz-Pace JA. An approach to prioritize code smells for refactoring. Autom Softw Eng. 2016;23(3):501–32. https://doi.org/10.1007/s10515-014-0175-x.
Article Google Scholar
Liu W, Wang S, Chen X, Jiang H. Predicting the severity of bug reports based on feature selection. Int J Softw Eng Knowl Eng. 2018;28(04):537–58. https://doi.org/10.1142/S0218194018500158.
Article Google Scholar
Tiwari O, Joshi R (2020) Functionality based code smell detection and severity classification. In: ISEC 2020: 13th innovations in software engineering conference, pp 1–5. https://doi.org/10.1145/3385032.3385048.
Baarah A, Aloqaily A, Salah Z, Zamzeer M, Sallam M. Machine learning approaches for predicting the severity level of software bug reports in closed source projects. (IJACSA) Int J Adv Comput Sci Appl. 2019;10(8):285–94.
Fontana FA, Mäntylä MV, Zanoni M, Marino A. Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng. 2016;21(3):1143–91.
Article Google Scholar
Mhawish MY, Gupta M. Generating code-smell prediction rules using decision tree algorithm and software metrics. Int J Comput Sci Eng (IJCSE). 2019;7(5):41–8.
Google Scholar
Mhawish MY, Gupta M. Predicting code smells and analysis of predictions: using machine learning techniques and software metrics. J Comput Sci Technol. 2020;35(6):1428–45. https://doi.org/10.1007/s11390-020-0323-7.
Article Google Scholar
Kaur I, Kaur A. A novel four-way approach designed with ensemble feature selection for code smell detection. IEEE Access. 2021;9:8695–707. https://doi.org/10.1109/ACCESS.2021.3049823.
Article Google Scholar
Pushpalatha MN, Mrunalini M. Predicting the severity of closed source bug reports using ensemble methods. In: Satapathy S, Bhateja V, Das S, editors. Smart intelligent computing and applications. Smart innovation, systems and technologies, vol. 105. Singapore: Springer; 2019. https://doi.org/10.1007/978-981-13-1927-3_62.
Chapter Google Scholar
Alazba A, Aljamaan HI. Code smell detection using feature selection and stacking ensemble: an empirical investigation. Inf Softw Technol. 2021;138: 106648.
Article Google Scholar
Draz MM, Farhan MS, Abdulkader SN, Gafar MG. Code smell detection using whale optimization algorithm. Comput Mater Continua. 2021;68(2):1919–35.
Article Google Scholar
Dewangan S, Rao RS, Mishra A, Gupta M. A novel approach for code smell detection: an empirical study. IEEE Access. 2021;9:162869–83. https://doi.org/10.1109/ACCESS.2021.3133810.
Article Google Scholar
Reis JPD, Abreu FBE, Carneiro GDF. Crowd smelling: a preliminary study on using collective knowledge in code smells detection. Empir Softw Eng. 2022;27:69. https://doi.org/10.1007/s10664-021-10110-5.
Article Google Scholar
van Oort B, Cruz L, Aniche M, van Deursen A. (2021) The prevalence of code smells in machine learning projects, IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). Madrid, Spain. p. 1–8. https://doi.org/10.1109/WAIN52551.2021.00011.
Fontana A, Mariani E, Morniroli A, Sormani R, Tonello A. (2011). An experience report on using code smells detection tools. In: IEEE fourth international conference on software testing, verification and validation workshops, RefTest 2011. Berlin: IEEE Computer Society; 2011. p. 450–7. https://doi.org/10.1109/ICSTW.2011.12.
Boutaib S, Elarbi M, Bechikh S, Palomba F, Said LB. A bi-level evolutionary approach for the multi-label detection of smelly classes. In: GECCO 22 companion, July 9–13, 2022, Boston, MA, USA. ACM ISBN 978-1-4503-9268-6/22/07. (2022). https://doi.org/10.1145/3520304.3528946.
Abdou AS, Darwish NR. Early prediction of software defect using ensemble learning: a comparative study. Int J Comput Appl. 2018;179(46):29–40. https://doi.org/10.5120/ijca2018917185.
Article Google Scholar
Dewangan S, Rao RS, Mishra A, Gupta M. Code smell detection using ensemble machine learning algorithms. Appl Sci. 2022;12(20):10321. https://doi.org/10.3390/app122010321.
Article Google Scholar
Dewangan S, Rao RS. Code smell detection using classification approaches. In: Udgata SK, Sethi S, Gao XZ, editors. Intelligent systems; lecture notes in networks and systems, vol. 431. Singapore: Springer; 2022. https://doi.org/10.1007/978-981-19-0901-6_25.
Chapter Google Scholar
Dewangan S, Rao RS, Yadav PS. Dimensionally reduction based machine learning approaches for code smells detection. In: 2022 international conference on intelligent controller and computing for smart power (ICICCSP); 2022. p. 1–4. https://doi.org/10.1109/ICICCSP53532.2022.9862030. Accessed 30 Jan 2023.
Fowler M. Refactoring: improving the design of existing code. Boston: Addison-Wesley Longman Publishing Co. Inc. http://www.refactoring.com/ (1999).
Gupta A, Chauhan NK. A severity-based classification assessment of code smells in Kotlin and Java application. Arab J Sci Eng. 2022;47:1831–48. https://doi.org/10.1007/s13369-021-06077-6.
Article Google Scholar
Abdou A, Darwish N. Severity classification of software code smells using machine learning techniques: a comparative study. J Softw Evol Proc. 2022. https://doi.org/10.1002/smr.2454.
Article Google Scholar
Hejres S, Hammad M. Code smell severity detection using machine learning. In: 4th smart cities symposium (SCS 2021); 2021. p. 89–96. https://doi.org/10.1049/icp.2022.0320.
Nanda J, Chhabra JK. SSHM: SMOTE-stacked hybrid model for improving severity classification of code smell. Int J Inf Technol. 2022. https://doi.org/10.1007/s41870-022-00943-8.
Article Google Scholar
Tempero E, Anslow C, Dietrich J, Han T, Li J, Lumpe M, Melton H, Noble J. The qualitas corpus: a curated collection of java code for empirical studies. In: Proceedings of the 17th Asia Pacific software engineering conference (APSEC 2010). IEEE Computer Society; 2010. p. 336–45. https://doi.org/10.1109/APSEC.2010.46.
Olbrich S, Cruzes D, Sjoberg DIK. Are all code smells harmful? A study of god classes and brain classes in the evolution of three open source systems. In: Proceedings of the IEEE international conference on software maintenance (ICSM 2010), Timisoara, Romania; 2010. p. 1–10. https://doi.org/10.1109/ICSM.2010.5609564.
Marinescu C, Marinescu R, Mihancea P, Ratiu D, Wettel R. iPlasma: an integrated platform for quality assessment of object-oriented design. In: Proceedings of the 21st IEEE international conference on software maintenance (ICSM 2005) (industrial and tool Proceedings), tool demonstration track. Budapest, Hungary: IEEE; 2005. p. 77–80.
Nongpong K. Integrating “code smell” detection with refactoring tool support. Ph.D. thesis, University of Wisconsin Milwaukee (2012).
Marinescu R. Measurement and quality in object oriented design. Ph.D. thesis, Department of Computer Science, “Polytechnic” University of Timisoara (2002).
Ali PJM, Faraj RH. Data normalization and standardization : a technical report. Mach Learn Tech Rep. 2014;1(1):1–6.
Google Scholar
Romero E, Sopena JM. Performing feature selection with multilayer perceptrons. IEEE Trans Neural Netw. 2008;19(3):431–41.
Article Google Scholar
https://www.geeksforgeeks.org/ml-chi-square-test-for-feature-selection. Accessed 30 Jan 2023.
Hsu CW, Chang CC, Lin CJ. A practical guide to support vector classification. Technical Report, Taiwan University; 2008. https://www.csie.ntu.edu.tw/cjlin/papers/guide/guide.pdf (2020).
Chaitra PC, Saravana Kumar R. A review of multi-class classification algorithm. Int J Pure Appl Math. 2018;118(14):17–26.
Google Scholar
https://www.tutorialspoint.com/machine_learning_with_python/machine_learning_with_python_knn_algorithm_finding_nearest_neighbors.htm. Accessed 30 Jan 2023.
https://www.geeksforgeeks.org/boosting-in-machine-learning-boosting-and-adaboost/. (Last Updated: 11 Oct 2021). Retrieved 26 Nov 2021.
https://analyticsindiamag.com/xgboost-internal-working-to-make-decision-trees-and-deduce-predictions/. Last Updated 2 Nov 2020. Retrieved 26 Nov 2021.
https://www.analyticsvidhya.com/blog/2021/04/how-the-gradient-boosting-algorithm-works/. (Last Updated 19 Apr 2021). Retrieved 26 Nov 2021.
https://www.geeksforgeeks.org/paired-t-test-a-detailed-overview/. (Last Updated 28 Feb 2022). Retrieved 26 Jan 2023.
https://towardsdatascience.com/paired-t-test-to-evaluate-machine-learning-classifiers-1f395a6c93fa. (Last Updated 6 July 2022). Retrieved 26 Jan 2023.

Download references

Funding

This study was not supported by any other sources.

Author information

Authors and Affiliations

Department of Computer Science and Information Technology, Guru Ghasidas Vishwavidyalaya, Bilaspur, Chhattisgarh, India
Seema Dewangan & Rajwant Singh Rao
Computer Science, DST-Centre for Interdisciplinary Mathematical Sciences, Institute of Science, Banaras Hindu University, Varanasi, India
Sripriya Roy Chowdhuri & Manjari Gupta

Authors

Seema Dewangan
View author publications
You can also search for this author in PubMed Google Scholar
Rajwant Singh Rao
View author publications
You can also search for this author in PubMed Google Scholar
Sripriya Roy Chowdhuri
View author publications
You can also search for this author in PubMed Google Scholar
Manjari Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, MG, and RSR; data curation, SD; formal analysis, MG, RSR; investigation, SD, and RSR; methodology, MG, and RSR; supervision, MG, RSR; validation, RSR, SD, and MG; visualization, SD, and RSR; writing, SD, and RSR; review and editing, RSR, MG, and SRC.

Corresponding author

Correspondence to Rajwant Singh Rao.

Ethics declarations

Conflict of Interest

No conflicts of interest exist, according to the authors, with the publishing of this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh and S. Karthikeyan.

Appendix

See Table 21.

Table 21 Description of all selected metrics [4]

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dewangan, S., Rao, R.S., Chowdhuri, S.R. et al. Severity Classification of Code Smells Using Machine-Learning Methods. SN COMPUT. SCI. 4, 564 (2023). https://doi.org/10.1007/s42979-023-01979-8

Download citation

Received: 27 February 2023
Accepted: 30 May 2023
Published: 29 July 2023
DOI: https://doi.org/10.1007/s42979-023-01979-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Severity Classification of Code Smells Using Machine-Learning Methods

Abstract

Access this article

Similar content being viewed by others

Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review

Method-Level Code Smells Detection Using Machine Learning Models

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Severity Classification of Code Smells Using Machine-Learning Methods

Abstract

Access this article

Similar content being viewed by others

Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review

Method-Level Code Smells Detection Using Machine Learning Models

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation