Skip to main content
Log in

On the classification of bug reports to improve bug localization

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Bug localization is the automated process of finding the possible faulty files in a software project. Bug localization allows developers to concentrate on vital files. Information retrieval (IR)-based approaches have been proposed to assist automatically identify software defects by using bug report information. However, some bug reports that are not semantically related to the relevant code are not helpful to IR-based systems. Running an IR-based reporting system can lead to false-positive results. In this paper, we propose a classification model for classifying a bug report as either uninformative or informative. Our approach helps to lower false positives and increase ranking performances by filtering uninformative information before running an IR-based bug location system. The model is based on implicit features learned from bug reports that use neural networks and explicit features defined manually. We test our proposed model on three open-source software projects that contain over 9000 bug reports. The results of the evaluation show that our model enhances the efficiency of a developed IR-based system in the trade-off between precision and recall. For implicit features, our tests with comparisons show that the LSTM network performs better than the CNN and multilayer perceptron with respect to the F-measurements. Combining both implicit and explicit features outperforms using only implicit features. Our classification model helps improve precision in bug localization tasks when precision is considered more important than recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://bugs.eclipse.org/bugs/.

  2. https://bugs.eclipse.org/bugs/show_bug.cgi?id=305571.

  3. https://lucene.apache.org/core/2_9_4/scoring.html.

  4. https://dumps.wikimedia.org/enwiki/.

  5. https://www.tensorflow.org/tutorials/recurrent.

References

  • Anvik J, Murphy GC (2011) Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Trans Softw Eng Methodol 20(10):1–10

    Article  Google Scholar 

  • Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering, New York, NY, USA, ICSE ’06, pp 361–370

  • Le TDB, Lo D, Le Goues C, Grunske L (2016) A learning-to-rank based fault localization approach using likely invariants. In: Proceedings of the 25th international symposium on software testing and analysis, ACM, New York, NY, USA, ISSTA 2016, pp 177–188. https://doi.org/10.1145/2931037.2931049

  • Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 2013 international conference on software engineering, Piscataway, NJ, USA, ICSE ’13, pp 712–721. http://dl.acm.org/citation.cfm?id=2486788.2486882

  • Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. Trans Neural Netw 5(2):157–166. https://doi.org/10.1109/72.279181

    Article  Google Scholar 

  • Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, New York, NY, USA, SIGSOFT ’08/FSE-16, pp 308–318

  • Bhattacharya P, Neamtiu I, Shelton CR (2012) Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J Syst Softw 85(10):2275–2292. https://doi.org/10.1016/j.jss.2012.04.053

    Article  Google Scholar 

  • Breu S, Premraj R, Sillito J, Zimmermann T (2010) Information needs in bug reports: improving cooperation between developers and users. In: Proceedings of the 2010 ACM conference on computer supported cooperative work, New York, NY, USA, CSCW ’10, pp 301–310

  • Bruegge B, Dutoit AH (2009) Object-oriented software engineering using UML, Patterns, and Java, 3rd edn. Prentice Hall Press, Upper Saddle River

    Google Scholar 

  • Buse RPL, Zimmermann T (2012) Information needs for software development analytics. In: Proceedings of the 2012 international conference on software engineering, Piscataway, NJ, USA, ICSE 2012, pp 987–996

  • Chaparro O, Florez JM, Marcus A (2019) Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empir Softw Eng 24(5):2947–3007

    Article  Google Scholar 

  • Choetkiertikul M, Dam HK, Tran T, Pham TTM, Ghose A, Menzies T (2018) A deep learning model for estimating story points. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2018.2792473

    Article  Google Scholar 

  • Cleve H, Zeller A (2005) Locating causes of program failures. In: Proceedings of the 27th international conference on software engineering, New York, NY, USA, ICSE ’05, pp 342–351

  • Dilshener T, Wermelinger M, Yu Y (2016) Locating bugs without looking back. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR), pp 286–290. https://doi.org/10.1109/MSR.2016.037

  • Dit B, Revelle M, Poshyvanyk D (2013) Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empir Softw Engg 18(2):277–309

    Article  Google Scholar 

  • Fu W, Menzies T (2017) Easy over hard: a case study on deep learning. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, ACM, New York, NY, USA, ESEC/FSE 2017, pp 49–60. https://doi.org/10.1145/3106237.3106256

  • Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. In: Proceedings of the 30th international conference on neural information processing systems. Curran Associates Inc., USA, NIPS’16, pp 1027–1035. http://dl.acm.org/citation.cfm?id=3157096.3157211

  • Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, AISTATS 2010. Chia Laguna Resort, Sardinia, Italy, May 13–15, 2010, pp 249–256. http://www.jmlr.org/proceedings/papers/v9/glorot10a.html

  • Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press

  • Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, NY, USA, FSE 2016, pp 631–642. https://doi.org/10.1145/2950290.2950334

  • Hoang T, Oentaryo RJ, Le TDB, Lo D (2018) Network-clustered multi-modal bug localization. IEEE Trans Softw Eng 45(10):1002–1023

    Article  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  • Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, New York, NY, USA, ASE ’07, pp 34–43

  • Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366. https://doi.org/10.1016/0893-6080(89)90020-8

    Article  MATH  Google Scholar 

  • Hu H, Zhang H, Xuan J, Sun W (2014) Effective bug triage based on historical bug-fix information. In: 2014 IEEE 25th international symposium on software reliability engineering, pp 122–132

  • Huo X, Li M (2017) Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proceedings of the 26th international joint conference on artificial intelligence. IJCAI’17. AAAI Press, pp 1909–1915. http://dl.acm.org/citation.cfm?id=3172077.3172153

  • Huo X, Li M, Zhou ZH (2016) Learning unified features from natural and programming languages for locating buggy source code. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. IJCAI’16, AAAI Press, pp 1606–1612. http://dl.acm.org/citation.cfm?id=3060832.3060845

  • Jin W, Orso A (2013) F3: Fault localization for field failures. In: Proceedings of the 2013 international symposium on software testing and analysis, New York, NY, USA, ISSTA 2013, pp 213–223

  • Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM international conference on automated software engineering, New York, NY, USA, ASE ’05, pp 273–282

  • Khatiwada S, Tushev M, Mahmoud A (2018) Just enough semantics: an information theoretic approach for IR-based software bug localization. Inf Softw Technol 93:45–57

    Article  Google Scholar 

  • Kim D, Tao Y, Kim S, Zeller A (2013) Where should we fix this bug? A two-phase recommendation model. IEEE Trans Softw Eng 39(11):1597–1610

    Article  Google Scholar 

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980

  • Koyuncu A, Bissyandé TF, Kim D, Liu K, Klein J, Monperrus M, Traon YL (2019) D&c: a divide-and-conquer approach to IR-based bug localization. arXiv preprint arXiv:1902.02703

  • Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2015) Combining deep learning with information retrieval to localize buggy files for bug reports (n). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), pp 476–481. https://doi.org/10.1109/ASE.2015.73

  • Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th international conference on program comprehension (ICPC), pp 218–229. https://doi.org/10.1109/ICPC.2017.24

  • Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010), pp 1–10. https://doi.org/10.1109/MSR.2010.5463284

  • LaToza TD, Myers BA (2010) Hard-to-answer questions about code. In: Evaluation and usability of programming languages and tools, New York, NY, USA, PLATEAU ’10, pp 8:1–8:6

  • Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21–26 June 2014, pp 1188–1196. http://jmlr.org/proceedings/papers/v32/le14.html

  • Le TDB, Oentaryo RJ, Lo D (2015) Information retrieval and spectrum based bug localization: better together. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, ACM, New York, NY, USA, ESEC/FSE 2015, pp 579–590. https://doi.org/10.1145/2786805.2786880

  • Le TDB, Thung F, Lo D (2017) Will this localization tool be effective for this bug? mitigating the impact of unreliability of information retrieval based bug localization tools. Empir Softw Eng 22(4):2237–2279

    Article  Google Scholar 

  • Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  • Lee J, Kim D, Bissyandé TF, Jung W, Le Traon Y (2018) Bench4bl: reproducibility study on the performance of IR-based bug localization. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 61–72

  • Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds) Advances in neural information processing systems 27. Curran Associates, Inc., pp 2177–2185. http://papers.nips.cc/paper/5477-neural-word-embedding-as-implicit-matrix-factorization.pdf

  • Liu C, Yan X, Fei L, Han J, Midkiff SP (2005) Sober: statistical model-based bug localization. In: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering, New York, NY, USA, ESEC/FSE-13, pp 286–295

  • Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using Latent Dirichlet Allocation. Inf Softw Technol 52(9):972–990

    Article  Google Scholar 

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  Google Scholar 

  • Miller GA (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol Rev 63(2):81–97

    Article  Google Scholar 

  • Moonen L (2001) Generating robust parsers using Island grammars. In: Proceedings eighth working conference on reverse engineering, pp 13–22, https://doi.org/10.1109/WCRE.2001.957806

  • Mou L, Li G, Zhang L, Wang T, Jin Z (2016) Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the thirtieth AAAI conference on artificial intelligence. AAAI Press, AAAI’16, pp 1287–1293. http://dl.acm.org/citation.cfm?id=3015812.3016002

  • Murphy-Hill E, Zimmermann T, Bird C, Nagappan N (2013) The design of bug fixes. In: Proceedings of the 2013 international conference on software engineering, Piscataway, NJ, USA, ICSE ’13, pp 332–341. http://dl.acm.org/citation.cfm?id=2486788.2486833

  • Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, Omnipress, USA, ICML’10, pp 807–814. http://dl.acm.org/citation.cfm?id=3104322.3104425

  • Nguyen AT, Nguyen TT, Al-Kofahi J, Nguyen HV, Nguyen TN (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering, Washington, DC, USA, ASE ’11, pp 263–272. https://doi.org/10.1109/ASE.2011.6100062

  • Pagliardini M, Gupta P, Jaggi M (2017) Unsupervised learning of sentence embeddings using compositional n-gram features. ArXiv e-prints 1703.02507

  • Poshyvanyk D, Gueheneuc YG, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432

    Article  Google Scholar 

  • Poshyvanyk D, Gethers M, Marcus A (2013) Concept location using formal concept analysis and information retrieval. ACM Trans Softw Eng Methodol 21(4):23:1–23:34

    Google Scholar 

  • Rahman MM, Roy CK (2018) Improving IR-based bug localization with context-aware query reformulation. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 621–632

  • Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th working conference on mining software repositories, New York, NY, USA, MSR ’11, pp 43–52

  • Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) ImageNet large scale visual recognition challenge. ArXiv e-prints 1409.0575

  • Saha R, Lease M, Khurshid S, Perry D (2013) Improving bug localization using structured information retrieval. In: 2013 IEEE/ACM 28th international conference on automated software engineering (ASE), pp 345–355

  • Shokripour R, Anvik J, Kasirun ZM, Zamani S (2013) Why so complicated? simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings of the 10th working conference on mining software repositories, Piscataway, NJ, USA, MSR ’13, pp 2–11. http://dl.acm.org/citation.cfm?id=2487085.2487089

  • Tantithamthavorn C, Abebe SL, Hassan AE, Ihara A, Matsumoto K (2018) The impact of IR-based classifier configuration on the performance and the effort of method-level bug localization. Inf Softw Technol 102:160–174

    Article  Google Scholar 

  • Voorhees EM (1999) The TREC-8 question answering track report. In: In Proceedings of TREC-8, pp 77–82

  • Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, New York, NY, USA, ASE 2016, pp 51–62. https://doi.org/10.1145/2970276.2970357

  • Xuan J, Jiang H, Hu Y, Ren Z, Zou W, Luo Z, Wu X (2015) Towards effective bug triage with software data reduction techniques. IEEE Trans Knowl Data Eng 27:264–280

    Article  Google Scholar 

  • Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, New York, NY, USA, FSE 2014, pp 689–699. http://dl.acm.org/citation.cfm?id=2337223.2337226

  • Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, New York, NY, USA, ICSE ’16, pp 404–415. https://doi.org/10.1145/2884781.2884862

  • Zhang T, Lee B (2013) A hybrid bug triage algorithm for developer recommendation. In: SAC

  • Zhang T, Chen J, Yang G, Lee B, Luo X (2016) Towards more accurate severity prediction and fixer recommendation of software bugs. J Syst Softw 117(C):166–184. https://doi.org/10.1016/j.jss.2016.02.034

    Article  Google Scholar 

  • Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 2012 international conference on software engineering, Piscataway, NJ, USA, ICSE 2012, pp 14–24. http://dl.acm.org/citation.cfm?id=2337223.2337226

Download references

Funding

This study was partially funded by Rochester Institute of Technology, SRS Proposal Number 17080438. This work was carried out at both California State University and Rochester Institute of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Wiem Mkaouer.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical standard

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, F., Wu, J., Li, Y. et al. On the classification of bug reports to improve bug localization. Soft Comput 25, 7307–7323 (2021). https://doi.org/10.1007/s00500-021-05689-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05689-2

Keywords

Navigation