Abstract
Locating relevant source files for a given bug report is an important task in software development and maintenance. To make the locating process easier, information retrieval methods have been widely used to compute the content similarities between bug reports and source files. In addition to content similarities, various other sources of information such as the metadata and the stack-trace in the bug report can be used to enhance the localization accuracy. In this paper, we propose a supervised topic modeling approach for automatically locating the relevant source files of a bug report. In our approach, we take into account the following five key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. Fourth, metainformation brings additional guidance on the search space. Fifth, buggy source files could be already contained in the stack-trace. By integrating the above five observations, we experimentally show that the proposed method can achieve up to 67.1% improvement in terms of prediction accuracy over its best competitors and scales linearly with the size of the data.






Similar content being viewed by others
Notes
This work is an extended version of our previous work [3] which considers the previous three components. Please refer to the related work section for more details.
In this paper, we interchangeably use ‘document’ and ‘bug report.’
A bug report may relate to multiple source files.
To simplify the processing of source files, we only keep the words in source flies that have appeared in the bug reports.
We incorporate these four terms in the model for completeness
This is exactly the STMLocator method in the previous conference version [3].
References
Le T-DB, Thung F, Lo D (2017) Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools. Empir Softw Eng 22(4):2237–2279
Zhang X, Yao Y, Wang Y, Xu F, Lu J (2017) Exploring metadata in bug reports for bug localization. In: Asia-Pacific software engineering conference (APSEC), 2017 24th. IEEE, pp 328–337
Wang Y, Yao Y, Hanghang T, Huo X, Li M, Xu F, Lu J (2018) Bug localization via supervised topic modeling. In: ICDM
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: ICSE. IEEE, pp 14–24
Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: ASE. IEEE, pp 345–355
Wang S, Lo D (2014) Version history, similar report, and structure: putting them together for improved bug localization. In: ICPC. ACM, pp 53–63
Wang S, Khomh F, Zou Y (2013) Improving bug localization using correlations in crash reports. In: MSR. IEEE, pp 247–256
Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 151–160
Wong C-P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 181–190
Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, pp 404–415
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Wu R, Zhang H, Cheung S.-C, Kim S (2014) Crashlocator: locating crashing faults based on crash stacks. In: Proceedings of the 2014 international symposium on software testing and analysis. ACM, pp 204–214
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: The foundations of software engineering. ACM, pp 689–699
Xia X, Lo D, Shihab E, Wang X, Zhou B (2015) Automatic, high accuracy prediction of reopened bugs. Autom Softw Eng 22(1):75–109
Ashok B, Joy J, Liang H, Rajamani SK, Srinivasa G, Vangala V (2009) Debugadvisor: a recommender system for debugging. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 373–382
Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the 6th international conference on Aspect-oriented software development. ACM, pp 212–224
Saha RK, Lawall J, Khurshid S, Perry DE (2014) On the effectiveness of information retrieval based bug localization for c programs. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 161–170
Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent Dirichlet allocation. Inf Softw Technol 52(9):972–990
Nguyen AT, Nguyen TT, Al-Kofahi J, Nguyen HV, Nguyen TN (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: ASE. IEEE, pp 263–272
Kim D, Tao Y, Kim S, Zeller A (2013) Where should we fix this bug? A two-phase recommendation model. IEEE Trans Softw Eng 39(11):1597–1610
Liu C, Yan X, Fei L, Han J, Midkiff SP (2005) Sober: statistical model-based bug localization. In: ACM SIGSOFT Software Engineering Notes, vol 30. ACM, pp 286–295
Poshyvanyk D, Gueheneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432
Youm KC, Ahn J, Kim J, Lee E (2015) Bug localization based on code change histories and bug reports. In: APSEC, pp 190–197
Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2015) Combining deep learning with information retrieval to localize buggy files for bug reports. In: ASE. IEEE, pp 476–481
Huo X, Li M, Zhou Z-H (2016) Learning unified features from natural and programming languages for locating buggy source code. In: IJCAI, pp 1606–1612
Huo X, Li M (2017) Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 1909–1915
Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th International Conference on program comprehension (ICPC). IEEE, pp 218–229
Xiao Y, Keung J, Mi Q, Bennin KE (2017) Improving bug localization with an enhanced convolutional neural network. In: 2017 24th Asia-Pacific software engineering conference (APSEC). IEEE, pp 338–347
Xiao Y, Keung J, Bennin KE, Mi Q (2018) Machine translation-based bug localization technique for bridging lexical gap. Inf Softw Technol 99:58–61
Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. ACM SIGPLAN Not 40(6):15–26
Liu C, Fei L, Yan X, Han J, Midkiff SP (2006) Statistical debugging: a hypothesis testing-based approach. IEEE Trans Softw Eng 32(10):831–848
Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: ASE. ACM, pp 273–282
Abreu R, Zoeteweij P, Van Gemund AJ (2007) On the accuracy of spectrum-based fault localization. In: TAICPART-MUTATION. IEEE 2007, pp 89–98
Xuan J, Monperrus M (2014) Learning to combine multiple ranking metrics for fault localization. In: ICSME
Ren X, Shah F, Tip F, Ryder BG, Chesley O (2004) Chianti: a tool for change impact analysis of java programs. In: ACM Sigplan Notices, vol. 39, no. 10. ACM, pp 432–448
Chesley OC, Ren X, Ryder BG, Tip F (2007) Crisp—a fault localization tool for java programs. In: 29th international conference on software engineering, 2007 (ICSE 2007). IEEE, pp 775–779
Brun Y, Ernst MD (2004) Finding latent code errors via machine learning over program executions. In: Proceedings of 26th international conference on software engineering, 2004 (ICSE 2004). IEEE, pp 480–490
Le T-DB, Oentaryo RJ, Lo D (2015) Information retrieval and spectrum based bug localization: better together. In: FSE. ACM, pp 579–590
Hoang TV-D, Oentaryo RJ, Le T-DB, Lo D (2018) Network-clustered multi-modal bug localization. IEEE Trans Softw Eng 45(10):1002–1023
Weiser M (1982) Programmers use slices when debugging. Commun ACM 25(7):446–452
Manevich R, Sridharan M, Adams S, Das M, Yang Z (2004) Pse: explaining program failures via postmortem static analysis. In: ACM SIGSOFT software engineering notes, vol 29, no. 6. ACM, pp 63–72
Acharya M, Robinson B (2011) Practical change impact analysis based on static program slicing for industrial software systems. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 746–755
Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: ESEC/FSE. ACM, pp 111–120
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP. Association for Computational Linguistics, pp 248–256
Asuncion A, Welling M, Smyth P, Teh YW (2009) On smoothing and inference for topic models. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. The Association for Uncertainty in Artificial Intelligence Press, pp 27–34
Porteous I, Newman D, Ihler A, Asuncion A, Smyth P, Welling M (2008) Fast collapsed Gibbs sampling for latent Dirichlet allocation. In: KDD. ACM, pp 569–577
Si X, Sun M (2009) Tag-lda for scalable real-time tag recommendation. J Comput Inf Syst 6(1):23–31
Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Nos. 61690204, 61672274, 61702252) and the Collaborative Innovation Center of Novel Software Technology and Industrialization. Hanghang Tong is partially supported by NSF (1651203, 1715385, and 1939725).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Hanghang Tong: The work was partly done, while the author was at Arizona State University.
Rights and permissions
About this article
Cite this article
Wang, Y., Yao, Y., Tong, H. et al. Enhancing supervised bug localization with metadata and stack-trace. Knowl Inf Syst 62, 2461–2484 (2020). https://doi.org/10.1007/s10115-019-01426-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01426-2