Abstract
Performance bugs bear a heavy cost on both software developers and end-users. Tools to reduce the occurrence, impact, and repair time of performance bugs, can therefore provide key assistance for software developers racing to fix these bugs. Classification models that focus on identifying defect-prone commits, referred to as Just-In-Time (JIT) Quality Assurance are known to be useful in allowing developers to review risky commits. These commits can be reviewed while they are still fresh in developers’ minds, reducing the costs of developing high-quality software. JIT models, however, leverage the SZZ approach to identify whether or not a change is bug-inducing. The fixes to performance bugs may be scattered across the source code, separated from their bug-inducing locations. The nature of performance bugs may make SZZ a sub-optimal approach for identifying their bug-inducing commits. Yet, prior studies that leverage or evaluate the SZZ approach do not distinguish performance bugs from other bugs, leading to potential bias in the results. In this paper, we conduct an empirical study on the JIT defect prediction for performance bugs. We concentrate on SZZ’s ability to identify the bug-inducing commits of performance bugs in two open-source projects, Cassandra, and Hadoop. We verify whether the bug-inducing commits found by SZZ are truly bug-inducing commits by manually examining these identified commits. Our manual examination includes cross referencing fix commits and JIRA bug reports. We evaluate model performance for JIT models by using them to identify bug-inducing code commits for performance related bugs. Our findings show that JIT defect prediction classifies non-performance bug-inducing commits better than performance bug-inducing commits, i.e., the SZZ approach does introduce errors when identifying bug-inducing commits. However, we find that manually correcting these errors in the training data only slightly improves the models. In the absence of a large number of correctly labelled performance bug-inducing commits, our findings show that combining all available training data (i.e., truly performance bug-inducing commits, non-performance bug-inducing commits, and non-bug-inducing commits) yields the best classification results.
Similar content being viewed by others
Notes
Our data files and scripts used are publicly available and can be found at: https://github.com/senseconcordia/Perf-JIT-Models
References
Agrawal A, Menzies T (2018) Is ”better data” better than ”better data miners”? on the benefits of tuning smote for defect prediction. In: Proceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18. Association for Computing Machinery, New York, , pp 1050–1061. [Online]. Available: https://doi.org/10.1145/3180155.3180197
Apache apache/cassandra (2019) [Online]. Available: https://github.com/apache/cassandra
Apache hadoop (2020) [Online]. Available: https://hadoop.apache.org/
Bryant RE, O’Hallaron DR (2015) Computer Systems: A Programmer’s Perspective, 3rd ed. Pearson
Catolino G (2017) Just-in-time bug prediction in mobile applications: The domain matters! pp 05
Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: An empirical assessment. In: 2019 IEEE/ACM 6th International Conference on Mobile Software Engineering and Systems (MOBILESoft), pp 99–110
Chen J, Shang W (2017) An exploratory study of performance regression introducing code changes. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 341–352
Chen T.-H., Shang W, Jiang ZM, Hassan AE, Nasser M, Flora P (2014) Detecting performance anti-patterns for applications developed using object-relational mapping. In: Proceedings of the 36th International Conference on Software Engineering, ser. ICSE 2014. Association for Computing Machinery, New York, pp 1001–1012. [Online]. Available: https://doi.org/10.1145/2568225.2568259
Chen J, Shang W, Shihab E (2020) Perfjit: Test-level just-in-time prediction for performance regression introducing commits. IEEE Trans Softw Eng:1–1
Correlation (pearson kendall, spearman) (2020) [Online]. Available: https://www.statisticssolutions.com/correlation-pearson-kendall-spearman/
da Costa DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan AE (2017) A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Trans Softw Eng 43(7):641–657
Davies S, Roper M, Wood M (2014) Comparing text-based and dependence-based approaches for determining the origins of bugs. J Softw Evol Process 26:01
Ding Z, Chen J, Shang W (2020) Towards the use of the readily available tests from the release pipeline as performance tests. are we there yet? In: 42nd International Conference on Software Engineering, Seoul
Dmwr (2020) [Online]. Available: https://www.rdocumentation.org/packages/DMwR/versions/0.4.1/topics/SMOTE
Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874
Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21:05
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th International Conference on Software Engineering - Vol 1, ser. ICSE ’15. IEEE Press, pp 789–800
Guindon C Swt: The standard widget toolkit. [Online]. Available: https://www.eclipse.org/swt/
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and Predicting Which Bugs Get Fixed: An Empirical Study of Microsoft Windows. Association for Computing Machinery, New York, pp 495–504. [Online]. Available: https://doi.org/10.1145/1806799.1806871
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910
Hamill M, Goseva-Popstojanova K (2014) Exploring the missing link: An empirical study of software fixes. Softw Test Verif Reliab 24(8):684–705. [Online]. Available: https://doi.org/10.1002/stvr.1518
Hassan AE (2009) Predicting faults using the complexity of code changes. In: 2009 IEEE 31st International Conference on Software Engineering, pp. 78–88
Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. SIGPLAN Not. 47(6):77–88
Jpace jpace/diffj [Online]. Available: https://github.com/jpace/diffj
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773
Kim S, Zimmermann T, Pan K, Whitehead Jr. E. J. (2006) Automatic identification of bug-introducing changes. In: 21st IEEE/ACM International Conference on Automated Software Engineering (ASE’06), pp 81–90
Kondo M, German D, Mizuno O, Choi E (2020) The impact of context metrics on just-in-time defect prediction. Empir Softw Eng 25:01
LaToza TD, Venolia G, DeLine R (2006) Maintaining mental models: A study of developer work habits. In: Proceedings of the 28th International Conference on Software Engineering, ser. ICSE ’06. ACM, New York, pp 492–501
Li H, Shang W, Zou Y, Hassan AE (2018) Towards just-in-time suggestions for log changes (journal-first abstract). In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 467–467
McDonald JH (2014) Handbook of biological statistics. sparky house publishing Baltimore. MD 3:186–189
McHugh M (2012) Interrater reliability: The kappa statistic, Biochemia medica : č,asopis Hrvatskoga društva medicinskih biokemičara / HDMB, vol 22, pp 276–82, 10
McIntosh S., Kamei Y (2018) Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction. IEEE Trans Softw Eng 44(5):412–428
Molyneaux I (2009) The Art of Application Performance Testing: Help for Programmers and Quality Assurance, 1st ed. O’Reilly Media, Inc.
Nayrolles M, Hamou-Lhadj A (2018) Clever: Combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects, pp 03
Neto E, Costa D, Kulesza U (2018) The impact of refactoring changes on the szz algorithm: An empirical study, pp 03
Nistor A, Jiang T, Tan L (2013) Discovering, reporting, and fixing performance bugs. in: 2013 10th working conference on mining software repositories (MSR), pp 237–246
Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Fujino K, Hata H, Ihara A, Matsumoto K (2015)
Radu A, Nadi S (2019) A dataset of non-functional bugs. In: Proceedings of the 16th International Conference on Mining Software Repositories, ser. MSR ’19. IEEE Press, Piscataway, pp 399–403
Rodrıguez-Perez G., Nagappan M, Robles G (2020) Watch out for extrinsic bugs! a case study of their impact in just-in-time bug prediction models on the openstack project. IEEE Transactions on Software Engineering
Rosen C, Grawi B, Shihab E (2015) Commit guru: Analytics and risk prediction of software commits. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015. ACM, New York, pp 966–969. [Online]. Available: https://doi.org/10.1145/2786805.2803183
Sawilowsky SS (2009) New effect size rules of thumb. J Modern Appl Stat Methods 8(2):26
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. SIGSOFT Softw Eng Notes 30(4):1–5
Syer MD, Shang W, Jiang ZM, Hassan AE (2017) Continuous validation of performance test workloads. Autom Softw Engg 2(1):189–231. [Online]. Available: https://doi.org/10.1007/s10515-016-0196-8
Tabassum S (2020) An investigation of cross-project learning in online just-in-time software defect prediction, pp 06
Tantithamthavorn C, Hassan AE, Matsumoto K (2020) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng 46(11):1200–1219
Team J Eclipse java development tools (jdt). [Online]. Available: https://www.eclipse.org/jdt/
Tsakiltsidis S, Miranskyy A, Mazzawi E (2016) On automatic detection of performance bugs. In: 2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp 132–139
Williams C, Spacco J (2008) Szz revisited: Verifying when changes induce fixes. In: Proceedings of the 2008 Workshop on Defects in Large Software Systems, ser. DEFECTS ’08. ACM, New York, pp 32–36
Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. Association for Computing Machinery, New York, pp 157–168. [Online]. Available: https://doi.org/10.1145/2950290.2950353
Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: A case study on firefox. In: Proceedings of the 8th Working Conference on Mining Software Repositories, ser. MSR ’11. ACM, New York, pp 93–102
Zaman S, Adams B, Hassan AE (2012) A qualitative study on performance bugs, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 199–208
Acknowledgements
This research was partially supported by JSPS KAKENHI Japan (Grant Numbers: 21H04877, JP18H03222) and JSPS International Joint Research Program with SNSF (Project “SENSOR”).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Nachiappan Nagappan
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Quach, S., Lamothe, M., Adams, B. et al. Evaluating the impact of falsely detected performance bug-inducing changes in JIT models. Empir Software Eng 26, 97 (2021). https://doi.org/10.1007/s10664-021-10004-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10004-6