Boosting crash-inducing change localization with rank-performance-based feature subset selection

Guo, Zhaoqiang; Li, Yanhui; Ma, Wanwangying; Zhou, Yuming; Lu, Hongmin; Chen, Lin; Xu, Baowen

doi:10.1007/s10664-020-09802-1

Boosting crash-inducing change localization with rank-performance-based feature subset selection

Published: 02 March 2020

Volume 25, pages 1905–1950, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Zhaoqiang Guo¹,
Yanhui Li¹,
Wanwangying Ma¹,
Yuming Zhou¹,
Hongmin Lu¹,
Lin Chen¹ &
…
Baowen Xu¹

435 Accesses
Explore all metrics

Abstract

Given a bucket of crash reports, it would be helpful for developers to find and fix the corresponding defects if the crash-inducing software changes can be automatically located. Recently, an approach called ChangeLocator was proposed, which used ten change-level features to train a supervised model based on the data from the historical fixed crashes. It was reported that ChangeLocator achieved a good performance in terms of Recall@1, MAP, and MRR, when all the ten features were combined together. However, in ChangeLocator, the redundancy between features are neglected, which may degrade the localization effectiveness. In this paper, we propose an improved approach ChangeRanker with a rank-performance-based feature selection technology (Rfs) to boost the effectiveness of crash-inducing change localization. Our experimental results on NetBeans show that ChangeRanker can achieve an improvement of 35.9%, 17.4%, and 15.3% over ChangeLocator in terms of Recall@1, MRR, and MAP, respectively. Furthermore, compared with three popular feature selection approaches, Rfs is able to select more informative features to boost localization effectiveness. In order to assess the real generalization capability of the proposed extension, we adapt ChangeRanker and ChangeLocator to locate bug-inducing changes on three additional data sets. Again, we observe that, on average, ChangeRanker achieves an improvement of 115.3%, 37.6%, and 41.2% in terms of Recall@1, MRR, and MAP, respectively. This indicates that our proposed rank-performance-based feature selection method has a good generalization capability. In summary, our work provides an easy-to-use approach to boosting the performance of the state-of-the-art crash-inducing change localization approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ChangeLocator: locate crash-inducing changes based on crash reports

Article 11 November 2017

An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs

Article 19 May 2021

Crash Processing for Selection of Unique Defects

Article 01 November 2018

Notes

CrashPad [online]. Available: https://chromium.googlesource.com/crashpad/crashpad/+/master/README.md
Technical note tn2123: Crashreporter (2015) [online]. Available: developer.apple.com/library/mac/#technotes/tn2004/tn2123.html
NetBeans project [online]. Available: https://netbeans.org/
The data sets and scripts can be download via https://github.com/Naplues/ChangeRanker.

References

P.D. Allison. Logistic regression using the SAS system: theory and application. 1999
Google Scholar
An L, Khomh F (2015) An empirical study of crash-inducing commits in Mozilla Firefox. PROMISE:1–10
Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(5):1–10
MathSciNet MATH Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
MathSciNet MATH Google Scholar
Bettenburg N, Premraj R, Zimmermann T (2008) Extracting structural information from bug reports. MSR:27–30
Chawla N, Bowyer K, Hall L, Kegelmeyer P (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Cohen J, Cohen P, West SG (2003) Applied multiple regression/correlation analysis for behavioral sciences. Lawrence Erlbaum, Hillsdale, NJ
Google Scholar
Costa DA, Mcintosh S, Shang W (2017) A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. IEEE Trans Softw Eng 43(7):641–657
Article Google Scholar
Dang Y, Wu R, Zhang H (2012) Rebucket: a method for clustering duplicate crash reports based on call stack similarity. ICSE:1084–1093
Gao K, Khoshgoftaar TM, Seliya N (2012) Predicting high-risk program modules by selecting the right software measurements. Softw Qual J 20(1):3–42
Article Google Scholar
Ghotra B, McIntosh S, Hassan AE (2017) A large-scale study of the impact of feature selection techniques on defect classification models. MSR:146–157
Green SB (1991) How many subjects does it take to do a regression analysis? Multivar Behav Res 26(3):499–510
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hall M, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447
Article Google Scholar
Hoang VD, Oentaryo RJ, Le (2018) TDB Network-clustered multi-modal bug localization. IEEE Transactions on Software Engineering, to appear
Holmes G, Donkin A, Witten IH (2002) WEKA: a machine learning workbench. ANZIIS:357–361
Jacek S, Zimmermann T, Zeller A (2005) When do changes induce fixes? MSR:1–5
Kim S, Zimmermann T, Pan K (2006) Automatic identification of bug-introducing changes. ASE:81–90
Kim S, Zimmermann T, Whitehead EJ (2007) Predicting faults from cached history. ICSE:489–498
Kim S, Whitehead EJ, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196
Article Google Scholar
Kim D, Wang X, Kim S (2011) Which crashes should I fix first?: predicting top crashes at an early stage to prioritize debugging efforts. IEEE Trans Softw Eng 37(3):430–447
Article Google Scholar
Kinshumann K, Glerum K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2011) Debugging in the (very) large: ten years of implementation and experience. Commun ACM 54(7):111–116
Article Google Scholar
Kohavi R. R. A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 1995: 1137–1143
Kohavi R, John G (1998) The wrapper approach. Feature Extraction, Construction and Selection. Springer US
Book Google Scholar
Le TDB, Oentaryo RJ, Lo D (2015) Information retrieval and spectrum based bug localization: better together. FSE:579–590
Lewis C, Lin Z, Sadowski C et al (2013) Does bug prediction support human developers? Findings from a google case study. ICSE:372–381
Lin D, Lin F, Lv Y, Cai F, Cao D (2018) Chinese character CAPTCHA recognition and performance estimation via deep neural network. Neurocomputing 288:11–19
Article Google Scholar
Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. WCRE:155–164
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press
Moreno L, Bandara W, Haiduc S (2013) On the relationship between the vocabulary of bug reports and source code. ICSM:452–455
Moser R, Pedrycz W, Succi G (2009) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. ICSE:181–190
Muthukumaran K, Rallapalli A, Murthy NLB (2015) Impact of feature selection techniques on bug prediction models. ISEC:120–129
Nguyen AT, Nguyen TT, Al-Kofahi J (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. ASE:263–272
Rahman S, Ganguly KK, Sakib K (2016) An improved bug localization using structured information retrieval and version history. ICCIT
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. MSR:43–52
Rathore SS, Gupta A (2014) A comparative study of feature-ranking and feature-subset selection techniques for improved fault prediction. ISEC
Rodríguez D, Ruiz R, Cuadrado-Gallego J, AguilarRuiz J (2007) Detecting fault modules applying feature selection to classifiers. IEEE Int Conf Inf Reuse Integr:667–672
Shaffer J (1986) Modified sequentially rejective multiple test procedures. J Am Stat Assoc 81(395):826–831
Article Google Scholar
Song Q, Guo Y, Shepperd M (2018) A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans Softw Eng
Tantithamthavorn C, Mclntosh S, Hassan AE, Matsumoto K (2019) The impact of automated parameter optimization on defect prediction models. IEEE Trans Softw Eng 45(7):683–711
Article Google Scholar
Uneno Y, Mizuno O, Choi EH (2016) Using a distributed representation of words in localizing relevant files for bug reports. QRS:183–190
Venkatesh GA (1991) The semantic approach to program slicing. PLDI:107–119
Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. CIDM:324–331
Wang S, Lo D, Lawall J (2014) Compositional vector space models for improved bug localization. ICSME:171–180
Wen M, Wu R, Cheung SC (2016) Locus: locating bugs from software changes. ASE:262–273
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83
Article Google Scholar
Williams C, Spacco J (2008) SZZ revisited: verifying when changes induce fixes. DEFECTS:32–36
Wilson CR, Voorhis V, Morgan BL (2007) Understanding power and rules of thumb for determining sample size. Tutor Quant Methods Psychol 3(2):43–50
Article Google Scholar
Wu R, Zhang H, Cheung SC (2014) Crashlocator: locating crashing faults based on crash stacks. ISSTA:204–214
Wu R, Wen M, Cheung SC (2018) Changelocator: locate crash-inducing changes based on crash reports. Empir Softw Eng 23(5):2866–2900
Article Google Scholar
Xu Z, Liu J, Yang Z, An G, Jia X (2016) The impact of feature selection on defect prediction performance: An empirical comparison. ISSRE:309–320
Youm KC, Ahn J, Kim J (2016) Bug localization based on code change histories and bug reports. APSEC:190–197
Zhang L, Kim M, Khurshid S (2011) Localizing failure-inducing program edits based on spectrum information. ICSM:23–32

Download references

Acknowledgements

This work is partially supported by the National Key R&D Program of China (2018YFB1003901) and the National Natural Science Foundation of China (61772259, 61872177, 61832009, 61772263).

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Zhaoqiang Guo, Yanhui Li, Wanwangying Ma, Yuming Zhou, Hongmin Lu, Lin Chen & Baowen Xu

Authors

Zhaoqiang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yanhui Li
View author publications
You can also search for this author in PubMed Google Scholar
Wanwangying Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yuming Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hongmin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Lin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Baowen Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuming Zhou.

Additional information

Guest Editor: Martin Monperrus

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, Z., Li, Y., Ma, W. et al. Boosting crash-inducing change localization with rank-performance-based feature subset selection. Empir Software Eng 25, 1905–1950 (2020). https://doi.org/10.1007/s10664-020-09802-1

Download citation

Published: 02 March 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10664-020-09802-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting crash-inducing change localization with rank-performance-based feature subset selection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

ChangeLocator: locate crash-inducing changes based on crash reports

An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs

Crash Processing for Selection of Unique Defects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Boosting crash-inducing change localization with rank-performance-based feature subset selection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

ChangeLocator: locate crash-inducing changes based on crash reports

An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs

Crash Processing for Selection of Unique Defects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation