Abstract
In recent years, researchers and practitioners have been studying the impact of test smells on test maintenance. However, there is still limited empirical evidence on why developers remove test smells in software maintenance and the mechanism employed for addressing test smells. In this paper, we conduct an empirical study on 12 real-world open-source systems to study the evolution and maintenance of test smells, and how test smells are related to software quality. Our results show that: 1) Although the number of test smell instances increases, test smell density decreases as systems evolve. 2) However, our qualitative analysis on those removed test smells reveals that most test smell removal (83%) is a by-product of feature maintenance activities. 45% of the removed test smells relocate to other test cases due to refactoring, while developers deliberately address the only 17% of the test smell instances, consisting of largely Exception Catch/Throw and Sleepy Test. 3) Our statistical model shows that test smell metrics can provide additional explanatory power on post-release defects over traditional baseline metrics (an average of 8.25% increase in AUC). However, most types of test smells have a minimal effect on post-release defects. Our study provides insight into how developers resolve test smells and current test maintenance practices. Future studies on test smells may consider focusing on the specific types of test smells that may have a higher correlation with defect-proneness when helping developers with test code maintenance.




Similar content being viewed by others
Notes
Logistic Regression from Lrm R package.
Redundancy analysis from the Hmisc R package.
VIF analysis from RegClass R package.
References
Akiyama F (1971) An example of software system debugging. In: Freiman CV, Griffith JE, Rosenfeld JL (eds) Information processing, Proceedings of IFIP, 1971. North-Holland, pp 353–359
AlDanial (2019) Count lines of code. https://github.com/AlDanial/cloc
Ali NB, Engström E, Taromirad M, Mousavi MR, Minhas NM, Helgesson D, Kunze S, Varshosaz M (2019) On the search for industry-relevant regression testing research. Empir Softw Eng 24(4):2020–2055
Apache (2020) Apache jenkins. https://builds.apache.org/. Last accessed April 3, 2020
Athanasiou D, Nugroho A, Visser J, Zaidman A (2014) Test code quality and its relation to issue handling performance. IEEE Trans Softw Eng 40 (11):1100–1125
Bavota G, Qusef A, Oliveto R, Lucia AD, Binkley DW (2012) An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In: 28th IEEE international conference on software maintenance, ICSM, pp 56–65. IEEE Computer Society
Bavota G, Qusef A, Oliveto R, De Lucia A, Binkley D (2012) An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In: 2012 28th IEEE international conference on software maintenance (ICSM), pp 56–65
Bavota G, Qusef A, Oliveto R, De Lucia A, Binkley D (2015) Are test smells really harmful? an empirical study. Empir Softw Eng 20(4):1052–1094
Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, SIGSOFT/FSE ’11, pp 4–14
Biyani S, Santhanam P (1998) Exploring defect data from development and customer usage on software modules over multiple releases. In: Ninth international symposium on software reliability engineering, ISSRE, pp 316–320. IEEE Computer Society
Bleser JD, Nucci DD, Roover CD (2019) Assessing diffusion and perception of test smells in scala projects. In: Storey MD, Adams B, Haiduc S (eds) Proceedings of the 16th international conference on mining software repositories, MSR, pp 457–467. IEEE / ACM
Chen T, Thomas SW, Hemmati H, Nagappan M, Hassan AE (2017) An empirical study on the effect of testing on code quality using topic models: a case study on software development systems. IEEE Trans Reliab 66(3):806–824
Chen T, Shang W, Nagappan M, Hassan AE, Thomas SW (2017) Topic-based software defect explanation. J Syst Softw 129:79–106
Chen T-H, Thomas SW, Nagappan M, Hassan A (2012) Explaining software defects using topic models. In: Proceedings of the 9th working conference on mining software repositories, MSR ’12
Child M, Rosner P, Counsell S (2019) A comparison and evaluation of variants in the coupling between objects metric. J Syst Softw 151:120–132
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Whitehead J, Zimmermann T (eds) Proceedings of the 7th international working conference on mining software repositories, MSR 2010 (Co-located with ICSE), Cape Town, South Africa, May 2-3, 2010, Proceedings, pp 31–41. IEEE Computer Society
de Pádua GB, Shang W (2018) Studying the relationship between exception handling practices and post-release defects. In: Proceedings of the 15th international conference on mining software repositories, MSR, pp 564–575
Deursen A, Moonen LM, Bergh A, Kok G (2001) Refactoring test code. Technical report, Amsterdam, The Netherlands, The Netherlands
Eck M, Palomba F, Castelluccio M, Bacchelli A (2019) Understanding flaky tests: the developer’s perspective. In: Dumas M, Pfahl D, Apel S, Russo A (eds) Proceedings of the ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/SIGSOFT FSE, pp 830–840. ACM
Garousi V, Küçük B (2018) Smells in software test code: a survey of knowledge in industry and academia. J Syst Softw 138:52–81
Harrell FE Jr (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer, Berlin
Jiarpakdee J, Tantithamthavorn C, Hassan AE (2018) The impact of correlated metrics on defect models. arXiv:1801.10271
Junior NS, Soares LR, Martins LA, Machado I (2020a) A survey on test practitioners’ awareness of test smells. arXiv:2003.05613
Junior NS, Soares LR, Martins LA, Machado I (2020b) A survey on test practitioners’ awareness of test smells. arXiv:2003.05613
Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan AE (2016) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21(5):2072–2106
Knuth DE (1981) Seminumerical Algorithms, volume 2 of The Art of Computer Programming, 2nd edn. Addison-Wesley, Reading
Kuhn M, Johnson K (2013) Applied predictive modeling, vol 26. Springer, Berlin
Lam W, Godefroid P, Nath S, Santhiar A, Thummalapenta S (2019) Root causing flaky tests in a large-scale industrial setting. In: Zhang D, Møller A (eds) Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, ISSTA, pp 101–111. ACM
Levin S, Yehudai A (2017) The co-evolution of test maintenance and code maintenance through the lens of fine-grained semantic changes. In: 2017 IEEE International conference on software maintenance and evolution, ICSME, pp 35–46. IEEE Computer Society
Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Cheung S, Orso A, Storey M D (eds) Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, (FSE-22), pp 643–653. ACM
Meszaros G (2007) xUnit test patterns: Refactoring test code. Pearson Education, London
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Schäfer W, Dwyer MB, Gruhn V (eds) 30th international conference on software engineering (ICSE ), pp 181–190. ACM
Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423–433
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering, pp 284–292
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Osterweil L J, Rombach H D, Soffa M L (eds) 28th international conference on software engineering (ICSE), pp 452–461. ACM
Palomba F, Bavota G, Penta MD, Oliveto R, Lucia AD (2014) Do they really smell bad? a study on developers’ perception of bad code smells. In: 30th IEEE international conference on software maintenance and evolution, pp 101–110. IEEE Computer Society
Palomba F, Nucci DD, Panichella A, Oliveto R, Lucia AD (2016) On the diffusion of test smells in automatically generated test code: an empirical study. In: Proceedings of the 9th international workshop on search-based software testing, SBST@ICSE, pp 5–14. ACM
Palomba F, Zanoni M, Fontana FA, Lucia AD, Oliveto R (2019) Toward a smell-aware bug prediction model. IEEE Trans Softw Eng 45(2):194–218
Peruma A, Almalki K, Newman CD, Mkaouer MW, Ouni A, Palomba F (2019) On the distribution of test smells in open source android applications: an exploratory study. In: Proceedings of the 29th annual international conference on computer science and software engineering, CASCON ’19, pp 193– 202
Peruma A, Almalki K, Newman CD, Mkaouer MW, Ouni A, Palomba F (2020) tsdetect: An open source test smells detection tool. Association for Computing Machinery, New York
Pham T M-T, Yang J (2020) The secret life of commented-out source code. In: 28th IEEE/ACM international conference on program comprehension, ICSE
Pinto LS, Sinha S, Orso A (2012) Understanding myths and realities of test-suite evolution. In: Tracz W, Robillard M P, Bultan T (eds) 20th ACM SIGSOFT symposium on the foundations of software engineering (FSE-20), p 33. ACM
Piotrowski P, Madeyski L (2020) Software defect prediction using bad code smells: a systematic literature review. In: Data-Centric Business and Applications, pp 77–99
Qusef A, Elish MO, Binkley DW (2019) An exploratory study of the relationship between software test smells and fault-proneness, vol 7, pp 139526–139536
Rahman F, Devanbu PT (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Taylor RN, Gall HC, Medvidovic N (eds) Proceedings of the 33rd international conference on software engineering, ICSE 2011, Waikiki, Honolulu, HI, USA, May 21-28, 2011, pp 491–500. ACM
Rodríguez-Pérez G, Robles G, Serebrenik A, Zaidman A, Germán DM, González-Barahona JM (2020) How bugs are born: a model to identify how bugs are introduced in software components. Empir Softw Eng 25(2):1294–1340
Shamshiri S, Rojas JM, Galeotti JP, Walkinshaw N, Fraser G (2018) How do automatically generated unit tests influence software maintenance?. In: 11th IEEE international conference on software testing, verification and validation, ICST, pp 250–261. IEEE Computer Society
Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27
Shi A, Bell J, Marinov D (2019) Mitigating the effects of flaky tests on mutation testing. In: Zhang D, Møller A (eds) Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, ISSTA. ACM, pp 112–122
Shihab E, Jiang ZM, Ibrahim WM, Adams B, Hassan AE (2010) Understanding the impact of code and process metrics on post-release defects: A case study on the eclipse project. In: Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement, vol 4. ACM
Spadini D, Palomba F, Zaidman A, Bruntink M, Bacchelli A (2018) On the relation of test smells to software code quality. In: 2018 IEEE international conference on software maintenance and evolution (ICSME), pp 1–12
Spadini D, Schvarcbacher M, Oprescu A, Bruntink M, Bacchelli A (2020) Investigating severity thresholds for test smells. In: Kim S, Gousios G, Nadi S , Hejderup J (eds) MSR ’20: 17th International conference on mining software repositories, Seoul, Republic of Korea, 29-30 June, 2020. ACM, pp 311–321
Spínola R O, Zazworka N, Vetro A, Shull F, Seaman CB (2019) Understanding automated and human-based technical debt identification approaches-a two-phase study. J Braz Comp Soc 25(1):5:1–5:21
Tsantalis N, Mansouri M, Eshkevari LM, Mazinanian D, Dig D (2018) Accurate and efficient refactoring detection in commit history. In: Proceedings of the 40th international conference on software engineering, ICSE ’18. ACM, New York, pp 483–494
Tufano M, Palomba F, Bavota G, Penta MD, Oliveto R, Lucia AD, Poshyvanyk D (2016) An empirical investigation into the nature of test smells. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pp 4–15
Tufano M, Palomba F, Bavota G, Oliveto R, Penta MD, Lucia AD, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). IEEE Trans Softw Eng 43(11):1063–1088
Van Deursen A, Moonen L, Van Den Bergh A, Kok G (2001) Refactoring test code. In: Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001), pp 92–95
Wang S, Chen T-H, Hassan AE (2018) Understanding the factors for fast answers in technical q&a websites. Empir Softw Eng 23(3):1552–1593
Yu CS, Treude C, Aniche MF (2019) Comprehending test code: an empirical study. arXiv:1907.13365
Zaidman A, Rompaey BV, Demeyer S, van Deursen A (2008) Mining software repositories to study co-evolution of production test code. In: 2008 1st international conference on software testing, verification, and validation, pp 220–229
Zeller A (2009) Why Programs Fail - A Guide to Systematic Debugging. 2nd edn. Academic Press, Cambridge
Zhao X, Liang J, Dang C (2019) A stratified sampling based clustering algorithm for large-scale data. Knowl Based Syst 163:416–428
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the third international workshop on predictor models in software engineering, PROMISE ’07, p 9
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Andy Zaidman
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Kim, D.J., Chen, TH.(. & Yang, J. The secret life of test smells - an empirical study on test smell evolution and maintenance. Empir Software Eng 26, 100 (2021). https://doi.org/10.1007/s10664-021-09969-1
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-09969-1