Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Mhawish, Mohammad Y.; Gupta, Manjari

doi:10.1007/s11390-020-0323-7

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Regular Paper
Published: 30 November 2020

Volume 35, pages 1428–1445, (2020)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Mohammad Y. Mhawish¹ &
Manjari Gupta¹

614 Accesses
27 Citations
Explore all metrics

Abstract

Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model’s predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software’s quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Article 08 April 2024

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Applications of AI in classical software engineering

Article Open access 26 July 2020

References

Wiegers K, Beatty J. Software Reqirements. Pearson Education, 2013.
Chung L, do Prado Leite J C S. On non-functional requirements in software engineering. In Conceptual Modeling: Foundations and Applications-Essays in Honor of John Mylopoulos, Borgida AT, Chaudhri V, Giorgini P, Yu E (eds.), Springer, 2009, pp.363-379.
Fowler M, Beck K, Brant J, Opdyke W, Roberts D. Refactoring: Improving the Design of Existing Code (1st edition). Addison-Wesley Professional, 1999.
Yamashita A, Moonen L. Exploring the impact of inter-smell relations on software maintainability: An empirical study. In Proc. the 35th Int. Conf. Softw. Eng., May 2013, pp.682-691.
Yamashita A, Counsell S. Code smells as system-level indicators of maintainability: An empirical study. J. Syst. Softw., 2013, 86(10): 2639-2653.
Article Google Scholar
Yamashita A, Moonen L. Do code smells reflect important maintainability aspects? In Proc. the 28th IEEE Int. Conf. Softw. Maintenance, September 2012, pp.306-315.
Sjøberg D I K, Yamashita A, Anda B C D, Mockus A, Dybå T. Quantifying the effect of code smells on maintenance effort. IEEE Trans. Softw. Eng., 2013, 39(8): 1144-1156.
Sahin D, Kessentini M, Bechikh S, Ded K. Code-smells detection as a bi-level problem. ACM Trans. Softw. Eng. Methodol., 2014, 24(1): Article No. 6.
Olbrich S, Cruzes D S, Basili V, Zazworka N. The evolution and impact of code smells: A case study of two open source systems. In Proc. the 3rd International Symposium on Empirical Software Engineering and Measurement, October 2009, pp.390-400.
Olbrich SM, Cruzes D S, Sjoøberg D I K. Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems. In Proc. the 26th IEEE Int. Conf. Softw. Maintenance, September 2010.
Khomh F, Penta D M, Guéhéneuc Y G. An exploratory study of the impact of code smells on software change-proneness. In Proc. the 16th Working Conference on Reverse Engineering, October 2009, pp.75-84.
Deligiannis I, Stamelos I, Angelis L, Roumeliotis M, Shepperd M. A controlled experiment investigation of an object-oriented design heuristic for maintainability. J. Syst. Softw., 2004, 72(2): 129-143.
Article Google Scholar
Pérez-Castillo R, Piattini M. Analyzing the harmful effect of god class refactoring on power consumption. IEEE Softw., 2014, 31(3): 48-54.
Article Google Scholar
Li W, Shatnawi R. An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution. J. Syst. Softw., 2007, 80(7): 1120-1128.
Ciupke O. Automatic detection of design problems in object-oriented reengineering. In Proc. the 30th International Conference on Technology of Object-Oriented Languages and Systems, Delivering Quality Software, August 1999, pp.18-32.
Travassos G, Shull F, Fredericks M, Basili V R. Detecting defects in object-oriented designs: Using reading techniques to increase software quality. ACM SIGPLAN Notices, 1999, 34(10): 47-56.
Article Google Scholar
Dashofy E M, van der Hoek A, Taylor R N. A comprehensive approach for the development of modular software architecture description languages. ACM Trans. Softw. Eng. Methodol., 2005, 14(2): 199-245.
Article Google Scholar
Vidal S, Vázquez H, Díaz-Pace J A, Marcos C, Garcia A, Oizumi W. JSpIRIT: A flexible tool for the analysis of code smells. In Proc. the 34th Int. Conf. Chil. Comput. Sci. Soc., November 2016.
Marinescu R. Measurement and quality in object-oriented design. In Proc. the 21st IEEE Int. Conf. Softw. Maintenance, September 2005, pp.701-704.
Moha N, Guéhéneuc Y, Duchien L, le Meur A. DECOR: A method for the specification and detection of code and design smells. IEEE Trans. Softw. Eng., 2010, 36(1): 20-36.
Article MATH Google Scholar
Fontana F A, Zanoni M, Marino A, Mäntylä M V. Code smell detection: Towards a machine learning-based approach. In Proc. the 2013 IEEE Int. Conf. Softw. Maintenance, September 2013, pp.396-399.
Azadi U, Fontana F A, Zanoni M. Machine learning based code smell detection through WekaNose. In Proc. the 40th Int. Conf. Softw. Eng., May 2018, pp.288-289.
Fontana F A, Zanoni M. Code smell severity classification using machine learning techniques. Knowledge-Based Syst., 2017, 128: 43-58.
Article Google Scholar
Fontana F A, Mäntylä M V, Zanoni M, Marino A. Comparing and experimenting machine learning techniques for code smell detection. Empir. Softw. Eng., 2016, 21(3): 1143-1191.
Article Google Scholar
Sharma T, Spinellis D. A survey on software smells. J. Syst. Softw., 2018, 138: 158-173.
Article Google Scholar
Rasool G, Arshad Z. A review of code smell mining techniques. J. Softw. Evol. Process, 2015, 27(11): 867-895.
Article Google Scholar
Fernandes E, Oliveira J, Vale G, Paiva T, Figueiredo E. A review-based comparative study of bad smell detection tools. In Proc. the 20th International Conference on Evaluation and Assessment in Software Engineering, June 2016, Article No. 18.
Fontana F A, Braione P, Zanoni M. Automatic detection of bad smells in code: An experimental assessment. J. Object Technol., 2012, 11(2): Article No. 5.
Riberro M T, Singh S, Guestrin C. “Why should I trust you?”: Explaining the predictions of and classifier. https//arxiv.org/abs/1602.04938, Oct. 2020.
Chicco D. Ten quick tips for machine learning in computational biology. BioData Mining, 2017, 10(1): 35.
Article Google Scholar
Marinescu R. Detection strategies: Metrics-based rules for detecting design flaws. In Proc. the 20th IEEE International Conference on Software Maintenance, December 2004, pp.350-359.
Abílio R, Padilha J, Figueiredo E, Costa H. Detecting code smells in software product lines — An exploratory study. In Proc. the 12th International Conference on Information Technology-New Generations, April 2015, pp.433-438.
Fenske W, Schulze S. Code smells revisited: A variability perspective. In Proc. the 9th International Workshop on Variability Modelling of Software-Intensive Systems, January 2015, Article No. 3.
Suryanarayana G, Samarthyam G, Sharma T. Refactoring for Software Design Smells: Managing Technical Debt (1st edition). Morgan Kaufmann, 2014.
Baudry B, Traon Y L, Sunyé G, Jézéquel J M. Measuring and improving design patterns testability. In Proc. the 9th IEEE International Software Metrics Symposium, September 2003.
Langelier G, Sahraoui H, Poulin P. Visualization-based analysis of quality for large-scale software systems. In Proc. the 20th IEEE/ACM International Conference on Automated Software Engineering, November 2005, pp.214-223.
Murphy-Hill E, Black A P. An interactive ambient visualization for code smells. In Proc. the 5th International Symposium on Software Visualization, October 2010, pp.5-14.
de Figueiredo Carneiro G, Silva M, Mara L et al. Identifying code smells with multiple concern views. In Proc. the 24th Brazilian Symposium on Software Engineering, September 2010, pp.128-137.
Kreimer J. Adaptive detection of design flaws. Electron. Notes Theor. Comput. Sci., 2005, 141(4): 117-136.
Article Google Scholar
Amorim L, Costa E, Antunes N, Fonseca B, Ribeiro M. Experience report: Evaluating the effectiveness of decision trees for detecting code smells. In Proc. the 26th IEEE International Symposium on Software Reliability Engineering, November 2015, pp.261-269.
Khomh F, Vaucher S, Guéhéneuc Y G, Sahraoui H. A Bayesian approach for the detection of code and design smells. In Proc. the 9th International Conference on Quality Software, August 2009, pp.305-314.
Khomh F, Vaucher S, Guéhéneuc Y G, Sahraoui H. BDTEX: A GQM-based Bayesian approach for the detection of antipatterns. J. Syst. Softw., 2011, 84(4): 559-572.
Article Google Scholar
Vaucher S, Khomh F, Moha N, Guéhéneuc Y G. Tracking design smells: Lessons from a study of god classes. In Proc. the 16th Working Conference on Reverse Engineering, October 2009, pp.145-154.
Hassaine S, Khomh F, Guéhéneuc Y G, Hamel S. IDS: An immune-inspired approach for the detection of software design smells. In Proc. the 7th International Conference on the Quality of Information and Communications Technology, September 2010, pp.343-348.
Maiga A, Ali N, Bhattacharya N et al. Support vector machines for anti-pattern detection. In Proc. the 27th IEEE/ACM International Conference on Automated Software Engineering, September 2012, pp.278-281.
Maiga A, Ali N, Bhattacharya N, Sabane A, Gueheneuc Y G, Aimeur E. SMURF: A SVM-based incremental anti-pattern detection approach. In Proc. the 19th Working Conference on Reverse Engineering, October 2012, pp.466-475.
Tempero E, Anslow C, Dietrich J et al. The Qualitas Corpus: A curated collection of Java code for empirical studies. In Proc. the 17th Asia Pacific Software Engineering Conference, November 2010, pp.336-345.
Pecorelli F, Palomba F, di Nucci D, de Lucia A. Comparing heuristic and machine learning approaches for metric-based code smell detection. In Proc. the 27th Int. Conf. Progr. Compr., May 2019, pp.93-104.
Wieman R. Anti-Pattern Scanner: An approach to detect anti-patterns and design violations [Master Thesis]. Department of Computer Science, Delft University of Technology, 2011.
Nongpong K. Integrating “code smells” detection with refactoring tool support [Ph.D. Thesis]. University of Wisconsin-Milwaukee, 2012.
Riel A J. Object-Oriented Design Heuristics (1st edition). Addison-Wesley Professional, 1996.
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res., 2002, 16: 321-357.
Article MATH Google Scholar
Do T D, Hui S C, Fong A C M. Associative classification with prediction confidence. In Proc. the 4th International Conference on Machine Learning and Cybernetics, August 2005, pp.199-208.
Malhotra R. Empirical Research in Software Engineering: Concepts, Analysis, and Applications (1st edition). Chapman and Hall/CRC, 2015.
Forman G, Scholz M, Rajaram S. Feature shaping for linear SVM classifiers. In Proc. the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 2009, pp.299-308.
Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems. Pattern Recognit., 2005, 38(12): 2270-2285.
Article Google Scholar
Yang J, Honavar V. Feature subset selection using a genetic algorithm. IEEE Intell. Syst., 1998, 13(2): 44-49.
Article Google Scholar
Cassar I R, Titus N D, Grill W M. An improved genetic algorithm for designing optimal temporal patterns of neural stimulation. J. Neural Eng., 2017, 14(6): Article No. 066013.
Hassanat A, Almohammadi K, Alkafaween E, Abunawas E, Hammouri A, Prasath V B. Choosing mutation and crossover ratios for genetic algorithms — A review with a new dynamic approach. Information, 2019, 10(12): Article No. 390.
Hall M A. Correlation-based feature subset selection for machine learning [Ph.D Thesis]. Department of Computer Science, The University of Waikato, 1998.
Vapnik V N. An overview of statistical learning theory. IEEE Trans. Neural Networks, 1999, 10(5): 988-999.
Article Google Scholar
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444.
Article Google Scholar
Aha D W, Kibler D, Albert M K. Instance-based learning algorithms. Mach. Learn., 1991, 6(1): 37-66.
Google Scholar
Rokach L, Maimon O Z. Data Mining with Decision Trees: Theory and Applications. World Scientific, 2007.
Malohlava M, Candel A, Click C, Roark H, Parmar V. Gradient boosting machine with H2O. https://www.h-2o.ai/wp-content/uploads/2018/01/GBM-BOOKLET.pdf, May 2020.
Hsu C W, Chang C C, Lin C J. A practical guide to support vector classification. Technical Report, Taiwan University, 2008. https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf, May 2020.
Thomas I L, Allcock G M. Determining the confidence level for a classification. Photogramm. Eng. Remote Sensing, 1984, 50(10): 1491-1496.
Google Scholar
Chakraborty S, Tomsett R, Raghavendra R et al. Interpretability of deep learning models: A survey of results. In Proc. the 2017 IEEE SmartWorld Ubiquitous Intell. Comput. Adv. and Trust. Comput. Scalable Comput. and Commun. Cloud Big Data Comput., Internet People Smart City Innov. SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI, August 2017.
Guggulothu T, Moiz S A. Code smell detection using multi-label classification approach. Softw. Qual. J., 2020, 28: 1063-1086.
Article Google Scholar
Kiyak E O, Birant D, Birant K U. Comparison of multilabel classification algorithms for code smell detection. In Proc. the 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies, October 2019.
di Nucci D, Palomba F, Tamburri D A, Serebrenik A, de Lucia A. Detecting code smells using machine learning techniques: Are we there yet? In Proc. the 25th IEEE Int. Conf. Softw. Anal. Evol. Reengineering, March 2018, pp.612-621.

Download references

Author information

Authors and Affiliations

Computer Science, Centre for Interdisciplinary Mathematical Sciences, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
Mohammad Y. Mhawish & Manjari Gupta

Authors

Mohammad Y. Mhawish
View author publications
You can also search for this author in PubMed Google Scholar
Manjari Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Y. Mhawish.

Supplementary Information

ESM 1

(PDF 124 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mhawish, M.Y., Gupta, M. Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics. J. Comput. Sci. Technol. 35, 1428–1445 (2020). https://doi.org/10.1007/s11390-020-0323-7

Download citation

Received: 24 January 2020
Revised: 29 September 2020
Published: 30 November 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11390-020-0323-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Abstract

Access this article

Similar content being viewed by others

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

How different are different diff algorithms in Git?

Applications of AI in classical software engineering

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

Abstract

Access this article

Similar content being viewed by others

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

How different are different diff algorithms in Git?

Applications of AI in classical software engineering

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation