Abstract
Bug localization utilizes the collected bug reports to locate the buggy source files. The state of the art falls short in handling the following three aspects, including (L1) the subtle difference between natural language and programming language, (L2) the noise in the bug reports and (L3) the multi-grained nature of programming language. To overcome these limitations, we propose a novel deep multimodal model named DeMoB for bug localization. It embraces three key features, each of which is tailored to address each of the three limitations. To be specific, the proposed DeMoB generates the multimodal coordinated representations for both bug reports and source files for addressing L1. It further incorporates the AttL encoder to process bug reports for addressing L2, and the MDCL encoder to process source files for addressing L3. Extensive experiments on four large-scale real-world data sets demonstrate that the proposed DeMoB significantly outperforms existing techniques.
Similar content being viewed by others
Notes
A bug tracking system for both free and open-source software, proprietary projects, and products. https://www.bugzilla.org.
References
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Baltrušaitis T, Ahuja C, Morency LP (2019) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Cao Y, Long M, Wang J, Yang Q, Yu PS (2016) Deep visual-semantic hashing for cross-modal retrieval. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining (pp 1445–1454). ACM
Cosi P, Caldognetto EM, Vagges K, Mian GA, Contolini M (1994) Bimodal recognition experiments with recurrent neural networks. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP) (vol 2, pp II–553). IEEE
DeMillo RA, Pan H, Spafford EH (1997) Failure and fault analysis for software debugging. In: Proceedings of annual international computer software and applications conference (COMPSAC) (pp 515–521). IEEE
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T et al (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129
Hoang T, Oentaryo RJ, Le TDB, Lo D (2018) Network-clustered multi-modal bug localization. IEEE Trans Software Eng 45(10):1002–1023
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Huo X, Li M (2017) Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 1909–1915
Huo X, Li M, Zhou ZH (2016) Learning unified features from natural and programming languages for locating buggy source code. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 1606–1612
Huo X, Yang Y, Li M, Zhan DC (2018) Learning semantic features for software defect prediction by code comments embedding. In: 2018 IEEE international conference on data mining (ICDM) (pp 1049–1054). IEEE
Jie Z, Wang XY, Dan H, Bing X, Lu Z, Hong M (2015) A survey on bug-report analysis. Sci China Inf Sci 58(2):1–24
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188
Kim D, Tao Y, Kim S, Zeller A (2013) Where should we fix this bug? A two-phase recommendation model. IEEE Trans Softw Eng 39(11):1597–1610
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: Proceedings of international conference on program comprehension (ICPC), pp 218–229
Li W, Li N (2012) A formal semantics for program debugging. Sci China Inf Sci 55(1):133–148
Liu Z, Zhou D, He J (2019) Towards explainable representation of time-evolving graphs via spatial-temporal graph attention networks. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 2137–2140
Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent Dirichlet allocation. In: Proceedings of working conference on reverse engineering (WCRE), pp 155–164
Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of working conference on reverse engineering (WCRE), pp 214–223
Mihalcea R, Liu H, Lieberman H (2006) Nlp (natural language processing) for nlp (natural language programming). In: Proceedings of international conference on intelligent text processing and computational linguistics (CICLing) (pp 319–330). Springer
Mroueh Y, Marcheret E, Goel V (2015) Deep multimodal learning for audio-visual speech recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp 2130–2134). IEEE
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365
Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional mkl based multimodal emotion recognition and sentiment analysis. In: Proceedings of international conference on data mining (ICDM) (pp 439–448). IEEE
Rahman MM, Roy C (2018) Poster: improving bug localization with report quality dynamics and query reformulation. In: Proceedings of IEEE/ACM international conference on software engineering: companion (ICSE-Companion) (pp 348–349). IEEE
Rajagopalan SS, Morency LP, Baltrusaitis T, Goecke R (2016) Extending long short-term memory for multi-view structured learning. In: Proceedings of European conference on computer vision (pp 338–353). Springer
Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: Proceedings of IEEE/ACM international conference on automated software engineering (ASE), pp 345–355
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Shi Z, Keung J, Bennin KE, Zhang X (2018) Comparing learning to rank techniques in hybrid bug localization. Appl Soft Comput 62:636–648
Silberer C, Lapata M (2014) Learning grounded meaning representations with autoencoders. Proc Annu Meet Assoc Comput Linguist 1:721–732
Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep Boltzmann machines. In: Proceedings of advances in neural information processing systems, pp 2222–2230
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Sterling CD, Olsson RA (2007) Automated bug isolation via program chipping. Softw Pract Exp 37(10):1061–1086
Vendrov I, Kiros R, Fidler S, Urtasun R (2015) Order-embeddings of images and language. arXiv preprint arXiv:1511.06361
Wang Q, Parnin C, Orso A (2015a) Evaluating the usefulness of ir-based fault localization techniques. In: Proceedings of international symposium on software testing and analysis (ISSTA) (pp 1–11). ACM
Wang W, Arora R, Livescu K, Bilmes J (2015b) On deep multi-view representation learning. In: Proceedings of international conference on machine learning (ICML), pp 1083–1092
Wang Y, Yao Y, Tong H, Huo X, Li M, Xu F, Lu J (2018) Bug localization via supervised topic modeling. In: 2018 IEEE international conference on data mining (ICDM) (pp 607–616). IEEE
Wong WE, Debroy V (2009) A survey of software fault localization. Department of Computer Science, University of Texas at Dallas, Tech Rep UTDCS-45 9
Wong WE, Qi Y (2006) Effective program debugging based on execution slices and inter-block data dependency. J Syst Softw 79(7):891–903
Xiao Y, Keung J, Mi Q, Bennin KE (2017) Improving bug localization with an enhanced convolutional neural network. In: 2017 24th Asia-Pacific software engineering conference (APSEC) (pp 338–347). IEEE
Xiao Y, Keung J, Mi Q, Bennin KE (2018) Bug localization with semantic and structural features using convolutional neural network and cascade forest. In: Proceedings of the 22nd international conference on evaluation and assessment in software engineering 2018, pp 101–111
Xu Y, Biswal S, Deshpande SR, Maher KO, Sun J (2018) Raim: recurrent attentive and intensive model of multimodal patient monitoring data. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining (pp 2565–2573). ACM
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of ACM SIGSOFT international symposium on foundations of software engineering (FSE), pp 689–699
Zhang X, He H, Gupta N, Gupta R (2005) Experimental evaluation of using dynamic slices for fault location. In: Proceedings of international symposium on automated analysis-driven debugging, pp 33–42
Zhang Y, Zheng W, Li M (2019) Learning uniform semantic features for natural language and programming language globally, locally and sequentially. Proc AAAI Conf Artif Intell 33:5845–5852
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of international conference on software engineering (ICSE), pp 14–24
Acknowledgements
This research was supported by Natural Science Foundation of China (No. 61772284), State Key Lab. for Novel Software Technology (KFKT2020B21), and Postgraduate Research and Practice Innovation Program of Jiangsu Province (SJKY19_0763). Hanghang Tong is partially supported by NSF (1947135 and 2003924).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhu, Z., Li, Y., Wang, Y. et al. A deep multimodal model for bug localization. Data Min Knowl Disc 35, 1369–1392 (2021). https://doi.org/10.1007/s10618-021-00755-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-021-00755-7