Abstract
Background. Code-line-level bugginess identification (CLBI) is a vital technique that can facilitate developers to identify buggy lines without expending a large amount of human effort. Most of the existing studies tried to mine the characteristics of source codes to train supervised prediction models, which have been reported to be able to discriminate buggy code lines amongst others in a target program.
Problem. However, several simple and clear code characteristics, such as complexity of code lines, have been disregarded in the current literature. Such characteristics can be acquired and applied easily in an unsupervised way to conduct more accurate CLBI, which also can decrease the application cost of existing CLBI approaches by a large margin.
Objective. We aim at investigating the status quo in the field of CLBI from the perspective of (1) how far we have really come in the literature, and (2) how far we have yet to go in the industry, by analyzing the performance of state-of-the-art (SOTA) CLBI approaches and tools, respectively.
Method. We propose a simple heuristic baseline solution GLANCE (aimin
Result. Based on 19 open-source projects with 142 different releases, the experimental results show that GLANCE framework has a prediction performance comparable or even superior to the existing SOTA CLBI approaches and tools in terms of 8 different performance indicators.
Conclusion. The results caution us that, if the identification performance is the goal, the real progress in CLBI is not being achieved as it might have been envisaged in the literature and there is still a long way to go to really promote the effectiveness of static analysis tools in industry. In addition, we suggest using GLANCE as a baseline in future studies to demonstrate the usefulness of any newly proposed CLBI approach.
- [1] . 2021. How to “DODGE” complex software analytics. IEEE Transactions on Software Engineering 47, 10 (2021), 2182–2194.
DOI: Google ScholarCross Ref - [2] . 2019. A survival analysis-based prioritization of code checker warning: A case study using PMD. In Proceedings of the Big Data, Cloud Computing, and Data Science Engineering, Selected Papers from The 4th IEEE/ACIS International Conference on Big Data, Cloud Computing, Data Science and Engineering. (Ed.), Vol. 844, Springer, 69–83.
DOI: Google ScholarCross Ref - [3] . 2015. Software defect prediction using cost-sensitive neural network. Applied Soft Computing 33 (2015), 263–277.
DOI: Google ScholarDigital Library - [4] . 1996. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22, 10 (1996), 751–761.
DOI: Google ScholarDigital Library - [5] . 2016. Analyzing the state of static analysis: A large-scale evaluation in open source software. In Proceedings of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering. IEEE Computer Society, 470–481.
DOI: Google ScholarCross Ref - [6] . 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series b-methodological 57, 1 (1995), 289–300.Google ScholarCross Ref
- [7] . 2018. NAR-miner: Discovering negative association rules from code for bug detection. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. , , and (Eds.), ACM, 411–422.
DOI: Google ScholarDigital Library - [8] BugDet 2022. Dataset and scripts. (2022). Retrieved from https://github.com/Naplues/BugDet. Accessed November 8, 2022.Google Scholar
- [9] . 2014. Syntax errors just aren’t natural: Improving error reporting with language models. In Proceedings of the 11th Working Conference on Mining Software Repositories. , , and (Eds.), ACM, 252–261.
DOI: Google ScholarDigital Library - [10] . 2019. “Sampling” as a baseline optimizer for search-based software engineering. IEEE Trans. Software Eng. 45, 6 (2019), 597–614.
DOI: Google ScholarCross Ref - [11] CLBI 2022. Replication kit. (2022). Retrieved from https://github.com/Naplues/CLBI. Accessed November 8, 2022.Google Scholar
- [12] . 2000. Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Software Engineering 26, 8 (2000), 797–814.
DOI: Google ScholarDigital Library - [13] . 2017. Easy over hard: A case study on deep learning. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. , , , and (Eds.), ACM, 49–60.
DOI: Google ScholarDigital Library - [14] . 2012. Method-level bug prediction. In Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. , , , , and (Eds.), ACM, 171–180.
DOI: Google ScholarDigital Library - [15] . 2021. How far have we progressed in identifying self-admitted technical debts? A comprehensive empirical study. ACM Transactions on Software Engineering and Methodology 30, 4 (2021), 45:1–45:56.
DOI: Google ScholarDigital Library - [16] . 2018. How many of all bugs do we find? A study of static bug detectors. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. , , and (Eds.), ACM, 317–328.
DOI: Google ScholarDigital Library - [17] . 1977. Elements of software science (Operating and Programming Systems Series). Elsevier Science Inc.
DOI: Google ScholarDigital Library - [18] . 2009. Predicting faults using the complexity of code changes. In Proceedings of the 31st International Conference on Software Engineering. IEEE, 78–88.
DOI: Google ScholarDigital Library - [19] . 2015. An empirical study on software defect prediction with a simplified metric set. Information and Software Technology 59 (2015), 170–190.
DOI: Google ScholarDigital Library - [20] . 2017. Are deep neural networks the best choice for modeling source code?. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. , , , and (Eds.), ACM, 763–773.
DOI: Google ScholarDigital Library - [21] . 2022. A fine-grained data set and analysis of tangling in bug fixing commits. Empirical Software Engineering 27, 6 (2022), 125.
DOI: Google ScholarDigital Library - [22] . 2012. On the naturalness of software. In Proceedings of the 34th International Conference on Software Engineering. , , and (Eds.), IEEE Computer Society, 837–847.
DOI: Google ScholarCross Ref - [23] . 2019. DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction. In Proceedings of the 16th International Conference on Mining Software Repositories. , , and (Eds.), IEEE / ACM, 34–45.
DOI: Google ScholarDigital Library - [24] . 2007. Finding more null pointer bugs, but not too many. In Proceedings of the 7th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. and (Eds.), ACM, 9–14.
DOI: Google ScholarDigital Library - [25] . 2019. How do developers act on static analysis alerts? An empirical study of coverity usage. In Proceedings of the 30th IEEE International Symposium on Software Reliability Engineering. , , , , , , and (Eds.), IEEE, 323–333.
DOI: Google ScholarCross Ref - [26] . 2019. Challenges with responding to static analysis tool alerts. In Proceedings of the 16th International Conference on Mining Software Repositories. , , and (Eds.), IEEE / ACM, 245–249.
DOI: Google ScholarDigital Library - [27] . 2020. The impact of automated feature selection techniques on the interpretation of defect models. Empirical Software Engineering 25, 5 (2020), 3590–3638.
DOI: Google ScholarDigital Library - [28] . 2022. An empirical study of model-agnostic techniques for defect prediction models. IEEE Transactions on Software Engineering 48, 2 (2022), 166–185.
DOI: Google ScholarDigital Library - [29] . 2013. Why don’t software developers use static analysis tools to find bugs?. In Proceedings of the 35th International Conference on Software Engineering. , , and (Eds.), IEEE Computer Society, 672–681.
DOI: Google ScholarCross Ref - [30] . 2016. Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering 21, 5 (2016), 2072–2106.
DOI: Google ScholarDigital Library - [31] . 2010. Revisiting common bug prediction findings using effort-aware models. In Proceedings of the 26th IEEE International Conference on Software Maintenance. , , and (Eds.), IEEE Computer Society, 1–10.
DOI: Google ScholarDigital Library - [32] . 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39, 6 (2013), 757–773.
DOI: Google ScholarDigital Library - [33] . 2020. A defect estimator for source code: Linking defect reports with programming constructs usage metrics. ACM Transactions on Software Engineering and Methodology 29, 2 (2020), 12:1–12:35.
DOI: Google ScholarDigital Library - [34] . 2007. Which warnings should I fix first?. In Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering. and (Eds.), ACM, 45–54.
DOI: Google ScholarDigital Library - [35] . 2007. Predicting faults from cached history. In Proceedings of the 29th International Conference on Software Engineering.IEEE Computer Society, 489–498.
DOI: Google ScholarDigital Library - [36] . 2020. The impact of context metrics on just-in-time defect prediction. Empirical Software Engineering 25, 1 (2020), 890–939.
DOI: Google ScholarDigital Library - [37] . 2005. An investigation of the effect of module size on defect prediction using static measures. Proceedings of the 2005 Workshop on Predictor Models in Software Engineering 30, 4 (2005), 1–5.
DOI: Google ScholarDigital Library - [38] . 2017. An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. Computer Standards and Interfaces 53 (2017), 1–32.
DOI: Google ScholarDigital Library - [39] . 2018. Bench4BL: Reproducibility study on the performance of IR-based bug localization. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. and (Eds.), ACM, 61–72.
DOI: Google ScholarDigital Library - [40] . 2017. Code churn: A neglected metric in effort-aware just-in-time defect prediction. In Proceedings of the 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. , , and (Eds.), IEEE Computer Society, 11–19.
DOI: Google ScholarDigital Library - [41] . 2018. Neural-machine-translation-based commit message generation: How far are we?. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. , , and (Eds.), ACM, 373–384.
DOI: Google ScholarDigital Library - [42] . 2020. SLDeep: Statement-level software defect prediction using deep-learning model on static code features. Expert Systems with Applications 147 (2020), 113156.
DOI: Google ScholarDigital Library - [43] . 2018. 500+ times faster than deep learning: A case study exploring faster methods for text mining stackoverflow. In Proceedings of the 15th International Conference on Mining Software Repositories. , , and (Eds.), ACM, 554–563.
DOI: Google ScholarDigital Library - [44] . 1976. A complexity measure. IEEE Transactions on Software Engineering 2, 4 (1976), 308–320.
DOI: Google ScholarDigital Library - [45] . 2018. Are fix-inducing changes a moving target? A longitudinal case study of just-in-time defect prediction. IEEE Transactions on Software Engineering 44, 5 (2018), 412–428.
DOI: Google ScholarCross Ref - [46] . 2007. Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33, 1 (2007), 2–13.
DOI: Google ScholarCross Ref - [47] . 2010. Defect prediction from static code features: Current results, limitations, new approaches. Automated Software Engineering 17, 4 (2010), 375–407.
DOI: Google ScholarDigital Library - [48] . 2013. Assessing the cost effectiveness of fault prediction in acceptance testing. IEEE Transactions on Software Engineering 39, 10 (2013), 1345–1357.
DOI: Google ScholarDigital Library - [49] . 2006. Mining metrics to predict component failures. In Proceedings of the 28th International Conference on Software Engineering. , , and (Eds.), ACM, 452–461.
DOI: Google ScholarDigital Library - [50] . 2008. The influence of organizational structure on software quality: An empirical case study. In Proceedings of the 30th International Conference on Software Engineering. , , and (Eds.), ACM, 521–530.
DOI: Google ScholarDigital Library - [51] . 2022. The best of both worlds: Integrating semantic features with expert features for defect prediction and localization. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. , , and (Eds.), ACM, 672–683.
DOI: Google ScholarDigital Library - [52] . 2004. Where the bugs are. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis. and (Eds.), ACM, 86–96.
DOI: Google ScholarDigital Library - [53] . 2005. Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering 31, 4 (2005), 340–355.
DOI: Google ScholarDigital Library - [54] . 2011. Are automated debugging techniques actually helping programmers?. In Proceedings of the 20th International Symposium on Software Testing and Analysis. and (Eds.), ACM, 199–209.
DOI: Google ScholarDigital Library - [55] . 2018. Re-evaluating method-level bug prediction. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering. , , and (Eds.), IEEE Computer Society, 592–601.
DOI: Google ScholarCross Ref - [56] . 2020. On the performance of method-level bug prediction: A negative result. Journal of Systems and Software 161 (2020), 1–15.
DOI: Google ScholarDigital Library - [57] . 2021. JITLine: A simpler, better, faster, finer-grained just-in-time defect prediction. In Proceedings of the 18th IEEE/ACM International Conference on Mining Software Repositories. IEEE, 369–379.
DOI: Google ScholarCross Ref - [58] . 2013. Software fault prediction metrics: A systematic literature review. Information and Software Technology 55, 8 (2013), 1397–1418.
DOI: Google ScholarDigital Library - [59] . 2014. Comparing static bug finders and statistical prediction. In Proceedings of the 36th International Conference on Software Engineering. , , and (Eds.), ACM, 424–434.
DOI: Google ScholarDigital Library - [60] . 2016. On the “naturalness” of buggy code. In Proceedings of the 38th International Conference on Software Engineering. , , and (Eds.), ACM, 428–439.
DOI: Google ScholarDigital Library - [61] . 2014. Code completion with statistical language models. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. and (Eds.), ACM, 419–428.
DOI: Google ScholarDigital Library - [62] . 2016. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco. , , , , , and (Eds.), ACM, 1135–1144.
DOI: Google ScholarDigital Library - [63] . 2021. RefDiff 2.0: A multi-language refactoring detection tool. IEEE Transactions on Software Engineering 47, 12 (2021), 2786–2802.
DOI: Google ScholarCross Ref - [64] . 2005. When do changes induce fixes?. In Proceedings of the 2005 International Workshop on Mining Software Repositories. ACM.
DOI: Google ScholarDigital Library - [65] . 2009. Alattin: Mining alternative patterns for detecting neglected conditions. In Proceedings of the ASE 2009, 24th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 283–294.
DOI: Google ScholarDigital Library - [66] . 2020. A longitudinal study of static analysis warning evolution and the effects of PMD on software quality in Apache open source projects. Empirical Software Engineering 25, 6 (2020), 5137–5192.
DOI: Google ScholarDigital Library - [67] . 2020. Static source code metrics and static analysis warnings for fine-grained just-in-time defect prediction. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution. IEEE, 127–138.
DOI: Google ScholarCross Ref - [68] . 2014. On the localness of software. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. , , and (Eds.), ACM, 269–280.
DOI: Google ScholarDigital Library - [69] . 2020. How developers engage with static analysis tools in different contexts. Empirical Software Engineering 25, 2 (2020), 1419–1457.
DOI: Google ScholarCross Ref - [70] . 2016. Bugram: Bug detection with n-gram language models. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. , , and (Eds.), ACM, 708–719.
DOI: Google ScholarDigital Library - [71] . 2007. Detecting object usage anomalies. In Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering. and (Eds.), ACM, 35–44.
DOI: Google ScholarDigital Library - [72] . 2019. Automatic classifying self-admitted technical debt using N-Gram IDF. In Proceedings of the 26th Asia-Pacific Software Engineering Conference. IEEE, 316–322.
DOI: Google ScholarCross Ref - [73] . 2022. Predicting defective lines using a model-agnostic technique. IEEE Transactions on Software Engineering 48, 5 (2022), 1480–1496.
DOI: Google ScholarDigital Library - [74] . 2014. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution. IEEE Computer Society, 181–190.
DOI: Google ScholarDigital Library - [75] . 2018. ChangeLocator: Locate crash-inducing changes based on crash reports. Empirical Software Engineering 23, 5 (2018), 2866–2900.
DOI: Google ScholarDigital Library - [76] . 2014. CrashLocator: Locating crashing faults based on crash stacks. In Proceedings of the International Symposium on Software Testing and Analysis. and (Eds.), ACM, 204–214.
DOI: Google ScholarDigital Library - [77] . 2017. File-level defect prediction: Unsupervised vs. supervised models. In Proceedings of the 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. , , and (Eds.), IEEE Computer Society, 344–353.
DOI: Google ScholarDigital Library - [78] . 2022. Just-in-time defect identification and localization: A two-phase framework. IEEE Transactions on Software Engineering 48, 2 (2022), 82–101.
DOI: Google ScholarDigital Library - [79] . 2015. Deep learning for just-in-time defect prediction. In Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security. IEEE, 17–26.
DOI: Google ScholarDigital Library - [80] . 2016. An empirical study on dependence clusters for effort-aware fault-proneness prediction. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. , , and (Eds.), ACM, 296–307.
DOI: Google ScholarDigital Library - [81] . 2016. Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. , , and (Eds.), ACM, 157–168.
DOI: Google ScholarDigital Library - [82] . 2019. Mining software defects: Should we consider affected releases?. In Proceedings of the 41st International Conference on Software Engineering. , , and (Eds.), IEEE / ACM, 654–665.
DOI: Google ScholarDigital Library - [83] . 2022. Identifying self-admitted technical debts with jitterbug: A two-step approach. IEEE Trans. Software Eng. 48, 5 (2022), 1676–1691.
DOI: Google ScholarDigital Library - [84] . 2017. How open source projects use static code analysis tools in continuous integration pipelines. In Proceedings of the 14th International Conference on Mining Software Repositories. , , and (Eds.), IEEE Computer Society, 334–344.
DOI: Google ScholarDigital Library - [85] . 2013. A cost-effectiveness criterion for applying software defect prediction models. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering., , , and (Eds.), ACM, 643–646.
DOI: Google ScholarDigital Library - [86] . 2012. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In 3Proceedings of the 4th International Conference on Software Engineering. , , and (Eds.), IEEE Computer Society, 14–24.
DOI: Google ScholarCross Ref - [87] . 2018. How far we have progressed in the journey? An examination of cross-project defect prediction. ACM Transactions on Software Engineering and Methodology 27, 1 (2018), 1:1–1:51.
DOI: Google ScholarDigital Library - [88] . 2007. Predicting defects for eclipse. In Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering. IEEE Computer Society, 9.
DOI: Google ScholarDigital Library
Index Terms
- Code-line-level Bugginess Identification: How Far have We Come, and How Far have We Yet to Go?
Recommendations
How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction
Background. Recent years have seen an increasing interest in cross-project defect prediction (CPDP), which aims to apply defect prediction models built on source projects to a target project. Currently, a variety of (complex) CPDP models have been ...
Is lines of code a good measure of effort in effort-aware models?
Context: Effort-aware models, e.g., effort-aware bug prediction models aim to help practitioners identify and prioritize buggy software locations according to the effort involved with fixing the bugs. Since the effort of current bugs is not yet known ...
Yet Another Model! A Study on Model’s Similarities for Defect and Code Smells
Fundamental Approaches to Software EngineeringAbstractSoftware defect and code smell prediction help developers identify problems in the code and fix them before they degrade the quality or the user experience. The prediction of software defects and code smells is challenging, since it involves many ...
Comments