Abstract
Bug localization is the automated process of finding the possible faulty files in a software project. Bug localization allows developers to concentrate on vital files. Information retrieval (IR)-based approaches have been proposed to assist automatically identify software defects by using bug report information. However, some bug reports that are not semantically related to the relevant code are not helpful to IR-based systems. Running an IR-based reporting system can lead to false-positive results. In this paper, we propose a classification model for classifying a bug report as either uninformative or informative. Our approach helps to lower false positives and increase ranking performances by filtering uninformative information before running an IR-based bug location system. The model is based on implicit features learned from bug reports that use neural networks and explicit features defined manually. We test our proposed model on three open-source software projects that contain over 9000 bug reports. The results of the evaluation show that our model enhances the efficiency of a developed IR-based system in the trade-off between precision and recall. For implicit features, our tests with comparisons show that the LSTM network performs better than the CNN and multilayer perceptron with respect to the F-measurements. Combining both implicit and explicit features outperforms using only implicit features. Our classification model helps improve precision in bug localization tasks when precision is considered more important than recall.
Similar content being viewed by others
References
Anvik J, Murphy GC (2011) Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Trans Softw Eng Methodol 20(10):1–10
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering, New York, NY, USA, ICSE ’06, pp 361–370
Le TDB, Lo D, Le Goues C, Grunske L (2016) A learning-to-rank based fault localization approach using likely invariants. In: Proceedings of the 25th international symposium on software testing and analysis, ACM, New York, NY, USA, ISSTA 2016, pp 177–188. https://doi.org/10.1145/2931037.2931049
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 2013 international conference on software engineering, Piscataway, NJ, USA, ICSE ’13, pp 712–721. http://dl.acm.org/citation.cfm?id=2486788.2486882
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. Trans Neural Netw 5(2):157–166. https://doi.org/10.1109/72.279181
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, New York, NY, USA, SIGSOFT ’08/FSE-16, pp 308–318
Bhattacharya P, Neamtiu I, Shelton CR (2012) Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J Syst Softw 85(10):2275–2292. https://doi.org/10.1016/j.jss.2012.04.053
Breu S, Premraj R, Sillito J, Zimmermann T (2010) Information needs in bug reports: improving cooperation between developers and users. In: Proceedings of the 2010 ACM conference on computer supported cooperative work, New York, NY, USA, CSCW ’10, pp 301–310
Bruegge B, Dutoit AH (2009) Object-oriented software engineering using UML, Patterns, and Java, 3rd edn. Prentice Hall Press, Upper Saddle River
Buse RPL, Zimmermann T (2012) Information needs for software development analytics. In: Proceedings of the 2012 international conference on software engineering, Piscataway, NJ, USA, ICSE 2012, pp 987–996
Chaparro O, Florez JM, Marcus A (2019) Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empir Softw Eng 24(5):2947–3007
Choetkiertikul M, Dam HK, Tran T, Pham TTM, Ghose A, Menzies T (2018) A deep learning model for estimating story points. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2018.2792473
Cleve H, Zeller A (2005) Locating causes of program failures. In: Proceedings of the 27th international conference on software engineering, New York, NY, USA, ICSE ’05, pp 342–351
Dilshener T, Wermelinger M, Yu Y (2016) Locating bugs without looking back. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR), pp 286–290. https://doi.org/10.1109/MSR.2016.037
Dit B, Revelle M, Poshyvanyk D (2013) Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empir Softw Engg 18(2):277–309
Fu W, Menzies T (2017) Easy over hard: a case study on deep learning. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, ACM, New York, NY, USA, ESEC/FSE 2017, pp 49–60. https://doi.org/10.1145/3106237.3106256
Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. In: Proceedings of the 30th international conference on neural information processing systems. Curran Associates Inc., USA, NIPS’16, pp 1027–1035. http://dl.acm.org/citation.cfm?id=3157096.3157211
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, AISTATS 2010. Chia Laguna Resort, Sardinia, Italy, May 13–15, 2010, pp 249–256. http://www.jmlr.org/proceedings/papers/v9/glorot10a.html
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press
Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, NY, USA, FSE 2016, pp 631–642. https://doi.org/10.1145/2950290.2950334
Hoang T, Oentaryo RJ, Le TDB, Lo D (2018) Network-clustered multi-modal bug localization. IEEE Trans Softw Eng 45(10):1002–1023
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, New York, NY, USA, ASE ’07, pp 34–43
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366. https://doi.org/10.1016/0893-6080(89)90020-8
Hu H, Zhang H, Xuan J, Sun W (2014) Effective bug triage based on historical bug-fix information. In: 2014 IEEE 25th international symposium on software reliability engineering, pp 122–132
Huo X, Li M (2017) Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proceedings of the 26th international joint conference on artificial intelligence. IJCAI’17. AAAI Press, pp 1909–1915. http://dl.acm.org/citation.cfm?id=3172077.3172153
Huo X, Li M, Zhou ZH (2016) Learning unified features from natural and programming languages for locating buggy source code. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. IJCAI’16, AAAI Press, pp 1606–1612. http://dl.acm.org/citation.cfm?id=3060832.3060845
Jin W, Orso A (2013) F3: Fault localization for field failures. In: Proceedings of the 2013 international symposium on software testing and analysis, New York, NY, USA, ISSTA 2013, pp 213–223
Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM international conference on automated software engineering, New York, NY, USA, ASE ’05, pp 273–282
Khatiwada S, Tushev M, Mahmoud A (2018) Just enough semantics: an information theoretic approach for IR-based software bug localization. Inf Softw Technol 93:45–57
Kim D, Tao Y, Kim S, Zeller A (2013) Where should we fix this bug? A two-phase recommendation model. IEEE Trans Softw Eng 39(11):1597–1610
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980
Koyuncu A, Bissyandé TF, Kim D, Liu K, Klein J, Monperrus M, Traon YL (2019) D&c: a divide-and-conquer approach to IR-based bug localization. arXiv preprint arXiv:1902.02703
Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2015) Combining deep learning with information retrieval to localize buggy files for bug reports (n). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), pp 476–481. https://doi.org/10.1109/ASE.2015.73
Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th international conference on program comprehension (ICPC), pp 218–229. https://doi.org/10.1109/ICPC.2017.24
Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010), pp 1–10. https://doi.org/10.1109/MSR.2010.5463284
LaToza TD, Myers BA (2010) Hard-to-answer questions about code. In: Evaluation and usability of programming languages and tools, New York, NY, USA, PLATEAU ’10, pp 8:1–8:6
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21–26 June 2014, pp 1188–1196. http://jmlr.org/proceedings/papers/v32/le14.html
Le TDB, Oentaryo RJ, Lo D (2015) Information retrieval and spectrum based bug localization: better together. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, ACM, New York, NY, USA, ESEC/FSE 2015, pp 579–590. https://doi.org/10.1145/2786805.2786880
Le TDB, Thung F, Lo D (2017) Will this localization tool be effective for this bug? mitigating the impact of unreliability of information retrieval based bug localization tools. Empir Softw Eng 22(4):2237–2279
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Lee J, Kim D, Bissyandé TF, Jung W, Le Traon Y (2018) Bench4bl: reproducibility study on the performance of IR-based bug localization. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 61–72
Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds) Advances in neural information processing systems 27. Curran Associates, Inc., pp 2177–2185. http://papers.nips.cc/paper/5477-neural-word-embedding-as-implicit-matrix-factorization.pdf
Liu C, Yan X, Fei L, Han J, Midkiff SP (2005) Sober: statistical model-based bug localization. In: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering, New York, NY, USA, ESEC/FSE-13, pp 286–295
Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using Latent Dirichlet Allocation. Inf Softw Technol 52(9):972–990
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Miller GA (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol Rev 63(2):81–97
Moonen L (2001) Generating robust parsers using Island grammars. In: Proceedings eighth working conference on reverse engineering, pp 13–22, https://doi.org/10.1109/WCRE.2001.957806
Mou L, Li G, Zhang L, Wang T, Jin Z (2016) Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the thirtieth AAAI conference on artificial intelligence. AAAI Press, AAAI’16, pp 1287–1293. http://dl.acm.org/citation.cfm?id=3015812.3016002
Murphy-Hill E, Zimmermann T, Bird C, Nagappan N (2013) The design of bug fixes. In: Proceedings of the 2013 international conference on software engineering, Piscataway, NJ, USA, ICSE ’13, pp 332–341. http://dl.acm.org/citation.cfm?id=2486788.2486833
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, Omnipress, USA, ICML’10, pp 807–814. http://dl.acm.org/citation.cfm?id=3104322.3104425
Nguyen AT, Nguyen TT, Al-Kofahi J, Nguyen HV, Nguyen TN (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering, Washington, DC, USA, ASE ’11, pp 263–272. https://doi.org/10.1109/ASE.2011.6100062
Pagliardini M, Gupta P, Jaggi M (2017) Unsupervised learning of sentence embeddings using compositional n-gram features. ArXiv e-prints 1703.02507
Poshyvanyk D, Gueheneuc YG, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432
Poshyvanyk D, Gethers M, Marcus A (2013) Concept location using formal concept analysis and information retrieval. ACM Trans Softw Eng Methodol 21(4):23:1–23:34
Rahman MM, Roy CK (2018) Improving IR-based bug localization with context-aware query reformulation. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 621–632
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th working conference on mining software repositories, New York, NY, USA, MSR ’11, pp 43–52
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) ImageNet large scale visual recognition challenge. ArXiv e-prints 1409.0575
Saha R, Lease M, Khurshid S, Perry D (2013) Improving bug localization using structured information retrieval. In: 2013 IEEE/ACM 28th international conference on automated software engineering (ASE), pp 345–355
Shokripour R, Anvik J, Kasirun ZM, Zamani S (2013) Why so complicated? simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings of the 10th working conference on mining software repositories, Piscataway, NJ, USA, MSR ’13, pp 2–11. http://dl.acm.org/citation.cfm?id=2487085.2487089
Tantithamthavorn C, Abebe SL, Hassan AE, Ihara A, Matsumoto K (2018) The impact of IR-based classifier configuration on the performance and the effort of method-level bug localization. Inf Softw Technol 102:160–174
Voorhees EM (1999) The TREC-8 question answering track report. In: In Proceedings of TREC-8, pp 77–82
Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, New York, NY, USA, ASE 2016, pp 51–62. https://doi.org/10.1145/2970276.2970357
Xuan J, Jiang H, Hu Y, Ren Z, Zou W, Luo Z, Wu X (2015) Towards effective bug triage with software data reduction techniques. IEEE Trans Knowl Data Eng 27:264–280
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, New York, NY, USA, FSE 2014, pp 689–699. http://dl.acm.org/citation.cfm?id=2337223.2337226
Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, New York, NY, USA, ICSE ’16, pp 404–415. https://doi.org/10.1145/2884781.2884862
Zhang T, Lee B (2013) A hybrid bug triage algorithm for developer recommendation. In: SAC
Zhang T, Chen J, Yang G, Lee B, Luo X (2016) Towards more accurate severity prediction and fixer recommendation of software bugs. J Syst Softw 117(C):166–184. https://doi.org/10.1016/j.jss.2016.02.034
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 2012 international conference on software engineering, Piscataway, NJ, USA, ICSE 2012, pp 14–24. http://dl.acm.org/citation.cfm?id=2337223.2337226
Funding
This study was partially funded by Rochester Institute of Technology, SRS Proposal Number 17080438. This work was carried out at both California State University and Rochester Institute of Technology.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical standard
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fang, F., Wu, J., Li, Y. et al. On the classification of bug reports to improve bug localization. Soft Comput 25, 7307–7323 (2021). https://doi.org/10.1007/s00500-021-05689-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05689-2