ABSTRACT
Code smells represent a well-known problem in software engineering, since they are a notorious cause of loss of comprehensibility and maintainability. The most recent efforts in devising automatic machine learning-based code smell detection techniques have achieved unsatisfying results so far. This could be explained by the fact that all these approaches follow a within-project classification, i.e. training and test data are taken from the same source project, which combined with the imbalanced nature of the problem, produces datasets with a very low number of instances belonging to the minority class (i.e. smelly instances). In this paper, we propose a cross-project machine learning approach and compare its performance with a within-project alternative. The core idea is to use transfer learning to increase the overall number of smelly instances in the training datasets. Our results have shown that cross-project classification provides very similar performance with respect to within-project. Despite this finding does not yet provide a step forward in increasing the performance of ML techniques for code smell detection, it sets the basis for further investigations.
- Marwen Abbes, Foutse Khomh, Yann-Gael Gueheneuc, and Giuliano Antoniol. 2011. An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In 2011 15Th european conference on software maintenance and reengineering. 181–190.Google Scholar
- Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, and Qing Wang. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology, 108 (2019), 115–138.Google ScholarCross Ref
- Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern information retrieval. 463, ACM press New York.Google Scholar
- Gabriele Bavota, Rocco Oliveto, Malcom Gethers, Denys Poshyvanyk, and Andrea De Lucia. 2013. Methodbook: Recommending move method refactorings via relational topic models. IEEE Transactions on Software Engineering, 40, 7 (2013), 671–694.Google ScholarDigital Library
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization.. Journal of machine learning research, 13, 2 (2012).Google Scholar
- Nanette Brown, Yuanfang Cai, Yuepu Guo, Rick Kazman, Miryung Kim, Philippe Kruchten, Erin Lim, Alan MacCormack, Robert Nord, and Ipek Ozkaya. 2010. Managing technical debt in software-reliant systems. In Proceedings of the FSE/SDP workshop on Future of software engineering research. 47–52.Google ScholarDigital Library
- Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering, 20, 6 (1994), 476–493.Google ScholarDigital Library
- Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions.. Psychological bulletin, 114, 3 (1993), 494.Google Scholar
- William Jay Conover. 1999. Practical nonparametric statistics (3. ed ed.). Wiley, New York, NY [u.a.]. isbn:0471160687Google Scholar
- Ward Cunningham. 1993. The WyCash portfolio management system. ACM SIGPLAN OOPS Messenger, 4, 2 (1993), 29–30.Google ScholarDigital Library
- Elder Vicente de Paulo Sobrinho, Andrea De Lucia, and Marcelo de Almeida Maia. 2018. A systematic literature review on bad smells—5 W’s: which, when, what, who, where. IEEE Transactions on Software Engineering.Google Scholar
- Dario Di Nucci, Fabio Palomba, Damian A Tamburri, Alexander Serebrenik, and Andrea De Lucia. 2018. Detecting code smells using machine learning techniques: are we there yet? In 2018 ieee 25th international conference on software analysis, evolution and reengineering (saner). 612–621.Google Scholar
- Eduardo Fernandes, Johnatan Oliveira, Gustavo Vale, Thanis Paiva, and Eduardo Figueiredo. 2016. A review-based comparative study of bad smell detection tools. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering. 1–12.Google ScholarDigital Library
- Marios Fokaefs, Nikolaos Tsantalis, and Alexander Chatzigeorgiou. 2007. Jdeodorant: Identification and removal of feature envy bad smells. In 2007 ieee international conference on software maintenance. 519–520.Google Scholar
- Marios Fokaefs, Nikolaos Tsantalis, Eleni Stroulia, and Alexander Chatzigeorgiou. 2011. JDeodorant: identification and application of extract class refactorings. In 2011 33rd International Conference on Software Engineering (ICSE). 1037–1039.Google ScholarDigital Library
- Francesca Arcelli Fontana, Pietro Braione, and Marco Zanoni. 2012. Automatic detection of bad smells in code: An experimental assessment.. J. Object Technol., 11, 2 (2012), 5–1.Google Scholar
- Francesca Arcelli Fontana, Jens Dietrich, Bartosz Walter, Aiko Yamashita, and Marco Zanoni. 2016. Antipattern and code smell false positives: Preliminary conceptualization and classification. In 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER). 1, 609–613.Google ScholarCross Ref
- Francesca Arcelli Fontana, Mika V Mäntylä, Marco Zanoni, and Alessandro Marino. 2016. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering, 21, 3 (2016), 1143–1191.Google ScholarDigital Library
- Francesca Arcelli Fontana and Marco Zanoni. 2017. Code smell severity classification using machine learning techniques. Knowledge-Based Systems, 128 (2017), 43–58.Google ScholarDigital Library
- Francesca Arcelli Fontana, Marco Zanoni, Alessandro Marino, and Mika V Mäntylä. 2013. Code smell detection: Towards a machine learning-based approach. In 2013 IEEE International Conference on Software Maintenance. 396–399.Google ScholarDigital Library
- Martin Fowler. 2018. Refactoring: improving the design of existing code. Addison-Wesley Professional.Google ScholarDigital Library
- Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2011. Developing fault-prediction models: What the research can show industry. IEEE software, 28, 6 (2011), 96–99.Google Scholar
- Philippe Kruchten, Robert L Nord, and Ipek Ozkaya. 2012. Technical debt: From metaphor to theory and practice. Ieee software, 29, 6 (2012), 18–21.Google Scholar
- Meir M Lehman. 1980. Programs, life cycles, and laws of software evolution. Proc. IEEE, 68, 9 (1980), 1060–1076.Google ScholarCross Ref
- Zaheed Mahmood, David Bowes, Peter CR Lane, and Tracy Hall. 2015. What is the impact of imbalance on software defect prediction performance? In Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering. 1–4.Google ScholarDigital Library
- Mika V Mäntylä and Casper Lassenius. 2006. Subjective evaluation of software evolvability using code smells: An empirical study. Empirical Software Engineering, 11, 3 (2006), 395–431.Google ScholarDigital Library
- Naouel Moha, Yann-Gaël Guéhéneuc, Laurence Duchien, and Anne-Francoise Le Meur. 2009. Decor: A method for the specification and detection of code and design smells. IEEE Transactions on Software Engineering, 36, 1 (2009), 20–36.Google ScholarDigital Library
- Robert M O’brien. 2007. A caution regarding rules of thumb for variance inflation factors. Quality & quantity, 41, 5 (2007), 673–690.Google Scholar
- Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Fausto Fasano, Rocco Oliveto, and Andrea De Lucia. 2018. On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empirical Software Engineering, 23, 3 (2018), 1188–1221.Google ScholarDigital Library
- Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrea De Lucia. 2014. Do they really smell bad? a study on developers’ perception of bad code smells. In 2014 IEEE International Conference on Software Maintenance and Evolution. 101–110.Google ScholarDigital Library
- Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Denys Poshyvanyk, and Andrea De Lucia. 2014. Mining version histories for detecting code smells. IEEE Transactions on Software Engineering, 41, 5 (2014), 462–489.Google ScholarDigital Library
- Fabio Palomba, Annibale Panichella, Andrea De Lucia, Rocco Oliveto, and Andy Zaidman. 2016. A textual-based technique for smell detection. In 2016 IEEE 24th international conference on program comprehension (ICPC). 1–10.Google ScholarCross Ref
- Fabiano Pecorelli, Dario Di Nucci, Coen De Roover, and Andrea De Lucia. 2020. A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software, 169 (2020), 110693.Google ScholarCross Ref
- Fabiano Pecorelli, Fabio Palomba, Dario Di Nucci, and Andrea De Lucia. 2019. Comparing heuristic and machine learning approaches for metric-based code smell detection. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). 93–104.Google ScholarDigital Library
- David MW Powers. 2020. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.Google Scholar
- Forrest Shull, Davide Falessi, Carolyn Seaman, Madeline Diep, and Lucas Layman. 2013. Technical debt: Showing the way for better transfer of empirical results. In Perspectives on the Future of Software Engineering. Springer, 179–190.Google Scholar
- Mervyn Stone. 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36, 2 (1974), 111–133.Google ScholarCross Ref
- Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2018. The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering, 45, 7 (2018), 683–711.Google ScholarCross Ref
- Michele Tufano, Fabio Palomba, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Andrea De Lucia, and Denys Poshyvanyk. 2017. When and why your code starts to smell bad (and whether the smells go away). IEEE Transactions on Software Engineering, 43, 11 (2017), 1063–1088.Google ScholarDigital Library
- Aki Vehtari, Andrew Gelman, and Jonah Gabry. 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing, 27, 5 (2017), 1413–1432.Google Scholar
- Zhou Xu, Jin Liu, Zijiang Yang, Gege An, and Xiangyang Jia. 2016. The impact of feature selection on defect prediction performance: An empirical comparison. In 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). 309–320.Google ScholarCross Ref
- Min Zhang, Tracy Hall, and Nathan Baddoo. 2011. Code bad smells: a review of current knowledge. Journal of Software Maintenance and Evolution: research and practice, 23, 3 (2011), 179–202.Google ScholarDigital Library
Index Terms
- Comparing within- and cross-project machine learning algorithms for code smell detection
Recommendations
Developer-Driven Code Smell Prioritization
MSR '20: Proceedings of the 17th International Conference on Mining Software RepositoriesCode smells are symptoms of poor implementation choices applied during software evolution. While previous research has devoted effort in the definition of automated solutions to detect them, still little is known on how to support developers when ...
Is your code harmful too? Understanding harmful code through transfer learning
SBQS '23: Proceedings of the XXII Brazilian Symposium on Software QualityCode smells are indicators of poor design implementation and decision-making that can potentially harm the quality of software. Therefore, detecting these smells is crucial to prevent such issues. Some studies aim to comprehend the impact of code smells ...
An Exploratory Study of the Impact of Code Smells on Software Change-proneness
WCRE '09: Proceedings of the 2009 16th Working Conference on Reverse EngineeringCode smells are poor implementation choices, thought to make object-oriented systems hard to maintain. In this study, we investigate if classes with code smells are more change-prone than classes without smells. Specifically, we test the general ...
Comments