Skip to main content
Log in

A deep multimodal model for bug localization

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Bug localization utilizes the collected bug reports to locate the buggy source files. The state of the art falls short in handling the following three aspects, including (L1) the subtle difference between natural language and programming language, (L2) the noise in the bug reports and (L3) the multi-grained nature of programming language. To overcome these limitations, we propose a novel deep multimodal model named DeMoB for bug localization. It embraces three key features, each of which is tailored to address each of the three limitations. To be specific, the proposed DeMoB generates the multimodal coordinated representations for both bug reports and source files for addressing L1. It further incorporates the AttL encoder to process bug reports for addressing L2, and the MDCL encoder to process source files for addressing L3. Extensive experiments on four large-scale real-world data sets demonstrate that the proposed DeMoB significantly outperforms existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. A bug tracking system for both free and open-source software, proprietary projects, and products. https://www.bugzilla.org.

  2. https://www.eclipse.org/aspectj/.

  3. https://www.eclipse.org/jdt/.

  4. https://www.eclipse.org/swt/.

  5. https://www.eclipse.org/eclipse/platform-ui/.

  6. https://pytorch.org/.

References

  • Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  • Baltrušaitis T, Ahuja C, Morency LP (2019) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443

    Article  Google Scholar 

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  • Cao Y, Long M, Wang J, Yang Q, Yu PS (2016) Deep visual-semantic hashing for cross-modal retrieval. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining (pp 1445–1454). ACM

  • Cosi P, Caldognetto EM, Vagges K, Mian GA, Contolini M (1994) Bimodal recognition experiments with recurrent neural networks. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP) (vol 2, pp II–553). IEEE

  • DeMillo RA, Pan H, Spafford EH (1997) Failure and fault analysis for software debugging. In: Proceedings of annual international computer software and applications conference (COMPSAC) (pp 515–521). IEEE

  • Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T et al (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129

  • Hoang T, Oentaryo RJ, Le TDB, Lo D (2018) Network-clustered multi-modal bug localization. IEEE Trans Software Eng 45(10):1002–1023

    Article  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Huo X, Li M (2017) Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 1909–1915

  • Huo X, Li M, Zhou ZH (2016) Learning unified features from natural and programming languages for locating buggy source code. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 1606–1612

  • Huo X, Yang Y, Li M, Zhan DC (2018) Learning semantic features for software defect prediction by code comments embedding. In: 2018 IEEE international conference on data mining (ICDM) (pp 1049–1054). IEEE

  • Jie Z, Wang XY, Dan H, Bing X, Lu Z, Hong M (2015) A survey on bug-report analysis. Sci China Inf Sci 58(2):1–24

    Article  Google Scholar 

  • Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188

  • Kim D, Tao Y, Kim S, Zeller A (2013) Where should we fix this bug? A two-phase recommendation model. IEEE Trans Softw Eng 39(11):1597–1610

    Article  Google Scholar 

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: Proceedings of international conference on program comprehension (ICPC), pp 218–229

  • Li W, Li N (2012) A formal semantics for program debugging. Sci China Inf Sci 55(1):133–148

    Article  MathSciNet  Google Scholar 

  • Liu Z, Zhou D, He J (2019) Towards explainable representation of time-evolving graphs via spatial-temporal graph attention networks. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 2137–2140

  • Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent Dirichlet allocation. In: Proceedings of working conference on reverse engineering (WCRE), pp 155–164

  • Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of working conference on reverse engineering (WCRE), pp 214–223

  • Mihalcea R, Liu H, Lieberman H (2006) Nlp (natural language processing) for nlp (natural language programming). In: Proceedings of international conference on intelligent text processing and computational linguistics (CICLing) (pp 319–330). Springer

  • Mroueh Y, Marcheret E, Goel V (2015) Deep multimodal learning for audio-visual speech recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp 2130–2134). IEEE

  • Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365

  • Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional mkl based multimodal emotion recognition and sentiment analysis. In: Proceedings of international conference on data mining (ICDM) (pp 439–448). IEEE

  • Rahman MM, Roy C (2018) Poster: improving bug localization with report quality dynamics and query reformulation. In: Proceedings of IEEE/ACM international conference on software engineering: companion (ICSE-Companion) (pp 348–349). IEEE

  • Rajagopalan SS, Morency LP, Baltrusaitis T, Goecke R (2016) Extending long short-term memory for multi-view structured learning. In: Proceedings of European conference on computer vision (pp 338–353). Springer

  • Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: Proceedings of IEEE/ACM international conference on automated software engineering (ASE), pp 345–355

  • Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  • Shi Z, Keung J, Bennin KE, Zhang X (2018) Comparing learning to rank techniques in hybrid bug localization. Appl Soft Comput 62:636–648

    Article  Google Scholar 

  • Silberer C, Lapata M (2014) Learning grounded meaning representations with autoencoders. Proc Annu Meet Assoc Comput Linguist 1:721–732

    Google Scholar 

  • Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep Boltzmann machines. In: Proceedings of advances in neural information processing systems, pp 2222–2230

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Sterling CD, Olsson RA (2007) Automated bug isolation via program chipping. Softw Pract Exp 37(10):1061–1086

    Article  Google Scholar 

  • Vendrov I, Kiros R, Fidler S, Urtasun R (2015) Order-embeddings of images and language. arXiv preprint arXiv:1511.06361

  • Wang Q, Parnin C, Orso A (2015a) Evaluating the usefulness of ir-based fault localization techniques. In: Proceedings of international symposium on software testing and analysis (ISSTA) (pp 1–11). ACM

  • Wang W, Arora R, Livescu K, Bilmes J (2015b) On deep multi-view representation learning. In: Proceedings of international conference on machine learning (ICML), pp 1083–1092

  • Wang Y, Yao Y, Tong H, Huo X, Li M, Xu F, Lu J (2018) Bug localization via supervised topic modeling. In: 2018 IEEE international conference on data mining (ICDM) (pp 607–616). IEEE

  • Wong WE, Debroy V (2009) A survey of software fault localization. Department of Computer Science, University of Texas at Dallas, Tech Rep UTDCS-45 9

  • Wong WE, Qi Y (2006) Effective program debugging based on execution slices and inter-block data dependency. J Syst Softw 79(7):891–903

    Article  Google Scholar 

  • Xiao Y, Keung J, Mi Q, Bennin KE (2017) Improving bug localization with an enhanced convolutional neural network. In: 2017 24th Asia-Pacific software engineering conference (APSEC) (pp 338–347). IEEE

  • Xiao Y, Keung J, Mi Q, Bennin KE (2018) Bug localization with semantic and structural features using convolutional neural network and cascade forest. In: Proceedings of the 22nd international conference on evaluation and assessment in software engineering 2018, pp 101–111

  • Xu Y, Biswal S, Deshpande SR, Maher KO, Sun J (2018) Raim: recurrent attentive and intensive model of multimodal patient monitoring data. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining (pp 2565–2573). ACM

  • Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of ACM SIGSOFT international symposium on foundations of software engineering (FSE), pp 689–699

  • Zhang X, He H, Gupta N, Gupta R (2005) Experimental evaluation of using dynamic slices for fault location. In: Proceedings of international symposium on automated analysis-driven debugging, pp 33–42

  • Zhang Y, Zheng W, Li M (2019) Learning uniform semantic features for natural language and programming language globally, locally and sequentially. Proc AAAI Conf Artif Intell 33:5845–5852

    Google Scholar 

  • Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of international conference on software engineering (ICSE), pp 14–24

Download references

Acknowledgements

This research was supported by Natural Science Foundation of China (No. 61772284), State Key Lab. for Novel Software Technology (KFKT2020B21), and Postgraduate Research and Practice Innovation Program of Jiangsu Province (SJKY19_0763). Hanghang Tong is partially supported by NSF (1947135 and 2003924).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Li.

Additional information

Responsible editor: Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Z., Li, Y., Wang, Y. et al. A deep multimodal model for bug localization. Data Min Knowl Disc 35, 1369–1392 (2021). https://doi.org/10.1007/s10618-021-00755-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-021-00755-7

Keywords

Navigation