skip to main content
10.1145/3472674.3473978acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Comparing within- and cross-project machine learning algorithms for code smell detection

Published:23 August 2021Publication History

ABSTRACT

Code smells represent a well-known problem in software engineering, since they are a notorious cause of loss of comprehensibility and maintainability. The most recent efforts in devising automatic machine learning-based code smell detection techniques have achieved unsatisfying results so far. This could be explained by the fact that all these approaches follow a within-project classification, i.e. training and test data are taken from the same source project, which combined with the imbalanced nature of the problem, produces datasets with a very low number of instances belonging to the minority class (i.e. smelly instances). In this paper, we propose a cross-project machine learning approach and compare its performance with a within-project alternative. The core idea is to use transfer learning to increase the overall number of smelly instances in the training datasets. Our results have shown that cross-project classification provides very similar performance with respect to within-project. Despite this finding does not yet provide a step forward in increasing the performance of ML techniques for code smell detection, it sets the basis for further investigations.

References

  1. Marwen Abbes, Foutse Khomh, Yann-Gael Gueheneuc, and Giuliano Antoniol. 2011. An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In 2011 15Th european conference on software maintenance and reengineering. 181–190.Google ScholarGoogle Scholar
  2. Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, and Qing Wang. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology, 108 (2019), 115–138.Google ScholarGoogle ScholarCross RefCross Ref
  3. Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern information retrieval. 463, ACM press New York.Google ScholarGoogle Scholar
  4. Gabriele Bavota, Rocco Oliveto, Malcom Gethers, Denys Poshyvanyk, and Andrea De Lucia. 2013. Methodbook: Recommending move method refactorings via relational topic models. IEEE Transactions on Software Engineering, 40, 7 (2013), 671–694.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization.. Journal of machine learning research, 13, 2 (2012).Google ScholarGoogle Scholar
  6. Nanette Brown, Yuanfang Cai, Yuepu Guo, Rick Kazman, Miryung Kim, Philippe Kruchten, Erin Lim, Alan MacCormack, Robert Nord, and Ipek Ozkaya. 2010. Managing technical debt in software-reliant systems. In Proceedings of the FSE/SDP workshop on Future of software engineering research. 47–52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering, 20, 6 (1994), 476–493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions.. Psychological bulletin, 114, 3 (1993), 494.Google ScholarGoogle Scholar
  9. William Jay Conover. 1999. Practical nonparametric statistics (3. ed ed.). Wiley, New York, NY [u.a.]. isbn:0471160687Google ScholarGoogle Scholar
  10. Ward Cunningham. 1993. The WyCash portfolio management system. ACM SIGPLAN OOPS Messenger, 4, 2 (1993), 29–30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Elder Vicente de Paulo Sobrinho, Andrea De Lucia, and Marcelo de Almeida Maia. 2018. A systematic literature review on bad smells—5 W’s: which, when, what, who, where. IEEE Transactions on Software Engineering.Google ScholarGoogle Scholar
  12. Dario Di Nucci, Fabio Palomba, Damian A Tamburri, Alexander Serebrenik, and Andrea De Lucia. 2018. Detecting code smells using machine learning techniques: are we there yet? In 2018 ieee 25th international conference on software analysis, evolution and reengineering (saner). 612–621.Google ScholarGoogle Scholar
  13. Eduardo Fernandes, Johnatan Oliveira, Gustavo Vale, Thanis Paiva, and Eduardo Figueiredo. 2016. A review-based comparative study of bad smell detection tools. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Marios Fokaefs, Nikolaos Tsantalis, and Alexander Chatzigeorgiou. 2007. Jdeodorant: Identification and removal of feature envy bad smells. In 2007 ieee international conference on software maintenance. 519–520.Google ScholarGoogle Scholar
  15. Marios Fokaefs, Nikolaos Tsantalis, Eleni Stroulia, and Alexander Chatzigeorgiou. 2011. JDeodorant: identification and application of extract class refactorings. In 2011 33rd International Conference on Software Engineering (ICSE). 1037–1039.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Francesca Arcelli Fontana, Pietro Braione, and Marco Zanoni. 2012. Automatic detection of bad smells in code: An experimental assessment.. J. Object Technol., 11, 2 (2012), 5–1.Google ScholarGoogle Scholar
  17. Francesca Arcelli Fontana, Jens Dietrich, Bartosz Walter, Aiko Yamashita, and Marco Zanoni. 2016. Antipattern and code smell false positives: Preliminary conceptualization and classification. In 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER). 1, 609–613.Google ScholarGoogle ScholarCross RefCross Ref
  18. Francesca Arcelli Fontana, Mika V Mäntylä, Marco Zanoni, and Alessandro Marino. 2016. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering, 21, 3 (2016), 1143–1191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Francesca Arcelli Fontana and Marco Zanoni. 2017. Code smell severity classification using machine learning techniques. Knowledge-Based Systems, 128 (2017), 43–58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Francesca Arcelli Fontana, Marco Zanoni, Alessandro Marino, and Mika V Mäntylä. 2013. Code smell detection: Towards a machine learning-based approach. In 2013 IEEE International Conference on Software Maintenance. 396–399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Martin Fowler. 2018. Refactoring: improving the design of existing code. Addison-Wesley Professional.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2011. Developing fault-prediction models: What the research can show industry. IEEE software, 28, 6 (2011), 96–99.Google ScholarGoogle Scholar
  23. Philippe Kruchten, Robert L Nord, and Ipek Ozkaya. 2012. Technical debt: From metaphor to theory and practice. Ieee software, 29, 6 (2012), 18–21.Google ScholarGoogle Scholar
  24. Meir M Lehman. 1980. Programs, life cycles, and laws of software evolution. Proc. IEEE, 68, 9 (1980), 1060–1076.Google ScholarGoogle ScholarCross RefCross Ref
  25. Zaheed Mahmood, David Bowes, Peter CR Lane, and Tracy Hall. 2015. What is the impact of imbalance on software defect prediction performance? In Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering. 1–4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mika V Mäntylä and Casper Lassenius. 2006. Subjective evaluation of software evolvability using code smells: An empirical study. Empirical Software Engineering, 11, 3 (2006), 395–431.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Naouel Moha, Yann-Gaël Guéhéneuc, Laurence Duchien, and Anne-Francoise Le Meur. 2009. Decor: A method for the specification and detection of code and design smells. IEEE Transactions on Software Engineering, 36, 1 (2009), 20–36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Robert M O’brien. 2007. A caution regarding rules of thumb for variance inflation factors. Quality & quantity, 41, 5 (2007), 673–690.Google ScholarGoogle Scholar
  29. Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Fausto Fasano, Rocco Oliveto, and Andrea De Lucia. 2018. On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empirical Software Engineering, 23, 3 (2018), 1188–1221.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrea De Lucia. 2014. Do they really smell bad? a study on developers’ perception of bad code smells. In 2014 IEEE International Conference on Software Maintenance and Evolution. 101–110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Denys Poshyvanyk, and Andrea De Lucia. 2014. Mining version histories for detecting code smells. IEEE Transactions on Software Engineering, 41, 5 (2014), 462–489.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Fabio Palomba, Annibale Panichella, Andrea De Lucia, Rocco Oliveto, and Andy Zaidman. 2016. A textual-based technique for smell detection. In 2016 IEEE 24th international conference on program comprehension (ICPC). 1–10.Google ScholarGoogle ScholarCross RefCross Ref
  33. Fabiano Pecorelli, Dario Di Nucci, Coen De Roover, and Andrea De Lucia. 2020. A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software, 169 (2020), 110693.Google ScholarGoogle ScholarCross RefCross Ref
  34. Fabiano Pecorelli, Fabio Palomba, Dario Di Nucci, and Andrea De Lucia. 2019. Comparing heuristic and machine learning approaches for metric-based code smell detection. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). 93–104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. David MW Powers. 2020. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.Google ScholarGoogle Scholar
  36. Forrest Shull, Davide Falessi, Carolyn Seaman, Madeline Diep, and Lucas Layman. 2013. Technical debt: Showing the way for better transfer of empirical results. In Perspectives on the Future of Software Engineering. Springer, 179–190.Google ScholarGoogle Scholar
  37. Mervyn Stone. 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36, 2 (1974), 111–133.Google ScholarGoogle ScholarCross RefCross Ref
  38. Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2018. The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering, 45, 7 (2018), 683–711.Google ScholarGoogle ScholarCross RefCross Ref
  39. Michele Tufano, Fabio Palomba, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Andrea De Lucia, and Denys Poshyvanyk. 2017. When and why your code starts to smell bad (and whether the smells go away). IEEE Transactions on Software Engineering, 43, 11 (2017), 1063–1088.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Aki Vehtari, Andrew Gelman, and Jonah Gabry. 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing, 27, 5 (2017), 1413–1432.Google ScholarGoogle Scholar
  41. Zhou Xu, Jin Liu, Zijiang Yang, Gege An, and Xiangyang Jia. 2016. The impact of feature selection on defect prediction performance: An empirical comparison. In 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). 309–320.Google ScholarGoogle ScholarCross RefCross Ref
  42. Min Zhang, Tracy Hall, and Nathan Baddoo. 2011. Code bad smells: a review of current knowledge. Journal of Software Maintenance and Evolution: research and practice, 23, 3 (2011), 179–202.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Comparing within- and cross-project machine learning algorithms for code smell detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MaLTESQuE 2021: Proceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution
      August 2021
      36 pages
      ISBN:9781450386258
      DOI:10.1145/3472674

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 August 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader