research-article

Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks

Authors:

Yan LeiAuthors Info & Claims

ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

Pages 117 - 127

https://doi.org/10.1145/3387904.3389263

Published: 12 September 2020 Publication History

Abstract

Developers rely on bug reports to fix bugs. The bug reports are usually stored and managed in bug tracking systems. Due to the different expression habits, different reporters may use different expressions to describe the same bug in the bug tracking system. As a result, the bug tracking system often contains many duplicate bug reports. Automatically detecting these duplicate bug reports would save a large amount of effort for bug analysis. Prior studies have found that deep-learning technique is effective for duplicate bug report detection. Inspired by recent Natural Language Processing (NLP) research, in this paper, we propose a duplicate bug report detection approach based on Dual-Channel Convolutional Neural Networks (DC-CNN). We present a novel bug report pair representation, i.e., dual-channel matrix through concatenating two single-channel matrices representing bug reports. Such bug report pairs are fed to a CNN model to capture the correlated semantic relationships between bug reports. Then, our approach uses the association features to classify whether a pair of bug reports are duplicate or not. We evaluate our approach on three large datasets from three open-source projects, including Open Office, Eclipse, Net Beans and a larger combined dataset, and the accuracy of classification reaches 0.9429, 0.9685, 0.9534, 0.9552 respectively. Such performance outperforms the two state-of-the-art approaches which also use deep-learning techniques. The results indicate that our dual-channel matrix representation is effective for duplicate bug report detection.

References

[1]

Karan Aggarwal, Finbarr Timbers, Tanner Rutgers, Abram Hindle, Eleni Stroulia, and Russell Greiner. 2017. Detecting duplicate bug reports with software engineering domain knowledge. Journal of Software: Evolution and Process 29, 3 (2017), e1821.

[2]

Anahita Alipour, Abram Hindle, and Eleni Stroulia. 2013. A contextual approach towards more accurate duplicate bug report detection. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 183--192.

[3]

John Anvik, Lyndon Hiew, and Gail C Murphy. 2005. Coping with an open bug repository. In Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange. ACM, 35--39.

Digital Library

[4]

Prasad V Bagal, Sameer Arun Joshi, Hanlin Daniel Chien, Ricardo Rey Diez, David Cavazos Woo, Emily Ronshien Su, and Sha Chang. 2019. Duplicate bug report detection using machine learning algorithms and automated feedback incorporation. US Patent App. 16/383,405.

[5]

Andrzej Białecki, Robert Muir, Grant Ingersoll, and Lucid Imagination. 2012. Apache lucene 4. In SIGIR 2012 workshop on open source information retrieval. 17.

[6]

David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.

Digital Library

[7]

Satya Prateek Bommaraju, Anjaneyulu Pasala, and Shivani Rao. 2018. System and method for detection of duplicate bug reports. US Patent 9,990,268.

[8]

Amar Budhiraja, Kartik Dutta, Raghu Reddy, and Manish Shrivastava. 2018. DWEN: deep word embedding network for duplicate bug report detection in software repositories. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 193--194.

Digital Library

[9]

Amar Budhiraja, Kartik Dutta, Manish Shrivastava, and Raghu Reddy. 2018. Towards Word Embeddings for Improved Duplicate Bug Report Retrieval in Software Repositories. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval. ACM, 167--170.

Digital Library

[10]

Yguarata Cerqueira Cavalcanti, Eduardo Santana de Almeida, Carlos Eduardo Albuquerque da Cunha, Daniel Lucredio, and Silvio Romero de Lemos Meira. 2010. An initial study on the bug report duplication problem. In 2010 14th European Conference on Software Maintenance and Reengineering. IEEE, 264--267.

Digital Library

[11]

Catarina Costa, Jair Figueiredo, Leonardo Murta, and Anita Sarma. 2016. TIP-Merge: recommending experts for integrating changes across branches. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 523--534.

Digital Library

[12]

Jayati Deshmukh, Sanjay Podder, Shubhashis Sengupta, Neville Dubash, et al. 2017. Towards accurate duplicate bug retrieval using deep learning techniques. In 2017 IEEE International conference on software maintenance and evolution (ICSME). IEEE, 115--124.

[13]

Yuanrui Fan, Xia Xin, Lo David, and Hassan Ahmed E. [n. d.]. Chaff from the Wheat: Characterizing and Determining Valid Bug Reports. IEEE Transactions on Software Engineering ([n. d.]), 1--1.

[14]

Ying Fu, Meng Yan, Xiaohong Zhang, Ling Xu, Dan Yang, and Jeffrey D Kymer. 2015. Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation. Information and Software Technology 57 (2015), 369--377.

[15]

Lyndon Hiew. 2006. Assisted detection of duplicate bug reports. Ph.D. Dissertation. University of British Columbia.

[16]

Pieter Hooimeijer and Westley Weimer. 2007. Modeling bug report quality. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 34--43.

Digital Library

[17]

Nicholas Jalbert and Westley Weimer. 2008. Automated duplicate detection for bug tracking systems. In 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). IEEE, 52--61.

[18]

Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Generating duplicate bug datasets. In Proceedings of the 11th working conference on mining software repositories. ACM, 392--395.

Digital Library

[19]

Hoa T Le, Christophe Cerisara, and Alexandre Denis. 2018. Do convolutional networks need to be deep for text classification?. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence.

[20]

Dean Lee, Vincent Siu, Rick Cruz, and Charles Yetman. 2016. Convolutional neural net and bearing fault analysis. In Proceedings of the International Conference on Data Mining (DMIN). The Steering Committee of The World Congress in Computer Science, Computer ..., 194.

[21]

Johannes Lerch and Mira Mezini. 2013. Finding duplicates of your yet unwritten bug report. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 69--78.

Digital Library

[22]

Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, 136--140.

[23]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016).

Digital Library

[24]

Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, and Gongyi Wu. 2004. Improving text classification using local latent semantic indexing. In Fourth IEEE International Conference on Data Mining (ICDM'04). IEEE, 162--169.

Digital Library

[25]

Tie-Yan Liu et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3 (2009), 225--331.

[26]

Marc Moreno Lopez and Jugal Kalita. 2017. Deep Learning applied to NLP. arXiv preprint arXiv:1703.03091 (2017).

[27]

Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, David Lo, and Chengnian Sun. 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM, 70--79.

Digital Library

[28]

Xin Rong. 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014).

[29]

Per Runeson, Magnus Alexandersson, and Oskar Nyholm. 2007. Detection of duplicate defect reports using natural language processing. In Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, 499--510.

Digital Library

[30]

Nicolas Serrano and Ismael Ciordia. 2005. Bugzilla, ITracker, and other bug trackers. IEEE software 22, 2 (2005), 11--13.

[31]

Chengnian Sun, David Lo, Siau-Cheng Khoo, and Jing Jiang. 2011. Towards more accurate retrieval of duplicate bug reports. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 253--262.

Digital Library

[32]

Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang, and Siau-Cheng Khoo. 2010. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 45--54.

Digital Library

[33]

Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM neural networks for language modeling. In Thirteenth annual conference of the international speech communication association.

[34]

Ashish Sureka and Pankaj Jalote. 2010. Detecting duplicate bug report using character n-gram-based features. In 2010 Asia Pacific Software Engineering Conference. IEEE, 366--374.

Digital Library

[35]

D Swapna and K Thammi Reddy. 2016. Duplicate Bug Report Detection of User Interface Bugs using Decision Tree Induction and Inverted Index Structure. (2016).

[36]

Yuan Tian, Chengnian Sun, and David Lo. 2012. Improved duplicate bug report identification. In 2012 16th European Conference on Software Maintenance and Reengineering. IEEE, 385--390.

Digital Library

[37]

Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. 2008. An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 30th international conference on Software engineering. ACM, 461--470.

Digital Library

[38]

Meng Yan, Ying Fu, Xiaohong Zhang, Dan Yang, Ling Xu, and Jeffrey D Kymer. 2016. Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project. Journal of Systems and Software 113 (2016), 296--308.

Digital Library

[39]

Meng Yan, Xiaohong Zhang, Dan Yang, Ling Xu, and Jeffrey D Kymer. 2016. A component recommender for bug reports using discriminative probability latent semantic analysis. Information and Software Technology 73 (2016), 37--51.

Digital Library

[40]

Wen Zhang, Taketoshi Yoshida, and Xijin Tang. 2008. Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21, 8 (2008), 879--886.

Digital Library

[41]

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649--657.

[42]

Jian Zhou and Hongyu Zhang. 2012. Learning to rank duplicate bug reports. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 852--861.

Digital Library

[43]

Jie Zou, Ling Xu, Mengning Yang, Meng Yan, Dan Yang, and Xiaohong Zhang. 2016. Duplication Detection for Software Bug Reports based on Topic Model. In 2016 9th International Conference on Service Science (ICSS). IEEE, 60--65.

[44]

Jie Zou, Ling Xu, Mengning Yang, Xiaohong Zhang, Jun Zeng, and Sachio Hirokawa. 2016. Automated duplicate bug report detection using multi-factor analysis. IEICE TRANSACTIONS on Information and Systems 99, 7 (2016), 1762--1775.

Cited By

Yang GJi JKim T(2025)Feature Learning via Correlation Analysis for Effective Duplicate DetectionApplied Sciences10.3390/app1503141115:3(1411)Online publication date: 30-Jan-2025
https://doi.org/10.3390/app15031411
Kim MKim YLee E(2025)Production and test bug report classification based on transfer learningInformation and Software Technology10.1016/j.infsof.2025.107685181(107685)Online publication date: May-2025
https://doi.org/10.1016/j.infsof.2025.107685
Montgomery LLüders CMaalej W(2025)Mining Issue Trackers: Concepts and TechniquesHandbook on Natural Language Processing for Requirements Engineering10.1007/978-3-031-73143-3_11(309-336)Online publication date: 6-Mar-2025
https://doi.org/10.1007/978-3-031-73143-3_11
Show More Cited By

Index Terms

Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks
1. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools

Recommendations

Improving bug reporting, duplicate detection, and localization
ICSE-C '17: Proceedings of the 39th International Conference on Software Engineering Companion

Software developers rely on essential textual information from bug reports (such as Observed Behavior, Expected Behavior, and Steps to Reproduce) to triage and fix software bugs. Unfortunately, while relevant and useful, this information is often ...
Improved Duplicate Bug Report Identification
CSMR '12: Proceedings of the 2012 16th European Conference on Software Maintenance and Reengineering

Bugs are prevalent in software systems. To improve the reliability of software systems, developers often allow end users to provide feedback on bugs that they encounter. Users could perform this by sending a bug report in a bug report management system ...
Duplicate Bug Report detection using Named Entity Recognition
Abstract
Software bugs pose significant challenges in management. The Bug Tracking System (BTS) serves as a standard platform to chronicle, oversee, and manage bugs throughout software development and maintenance. While BTS aggregates numerous bug reports ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

July 2020

481 pages

ISBN:9781450379588

DOI:10.1145/3387904

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPC '20

Sponsor:

SIGSOFT

ICPC '20: 28th International Conference on Program Comprehension

July 13 - 15, 2020

Seoul, Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

51
Total Citations
View Citations
540
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)7

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang GJi JKim T(2025)Feature Learning via Correlation Analysis for Effective Duplicate DetectionApplied Sciences10.3390/app1503141115:3(1411)Online publication date: 30-Jan-2025
https://doi.org/10.3390/app15031411
Kim MKim YLee E(2025)Production and test bug report classification based on transfer learningInformation and Software Technology10.1016/j.infsof.2025.107685181(107685)Online publication date: May-2025
https://doi.org/10.1016/j.infsof.2025.107685
Montgomery LLüders CMaalej W(2025)Mining Issue Trackers: Concepts and TechniquesHandbook on Natural Language Processing for Requirements Engineering10.1007/978-3-031-73143-3_11(309-336)Online publication date: 6-Mar-2025
https://doi.org/10.1007/978-3-031-73143-3_11
Trinkenreich BSantos FStol K(2024)Predicting Attrition among Software Professionals: Antecedents and Consequences of Burnout and EngagementACM Transactions on Software Engineering and Methodology10.1145/369162933:8(1-45)Online publication date: 2-Sep-2024
https://dl.acm.org/doi/10.1145/3691629
Zhang ZTawsif FRyu KYu THalfond W(2024)Mobile Bug Report Reproduction via Global Search on the App UI ModelProceedings of the ACM on Software Engineering10.1145/36608241:FSE(2656-2676)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660824
Wu XLi HYoshioka NWashizaki HKhomh F(2024)Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00019(114-125)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00019
Yao YWang JHu YWang LZhou YChen JGai XWang ZLiu W(2024)BugBlitz-AI: An Intelligent QA Assistant2024 IEEE 15th International Conference on Software Engineering and Service Science (ICSESS)10.1109/ICSESS62520.2024.10719045(57-63)Online publication date: 13-Sep-2024
https://doi.org/10.1109/ICSESS62520.2024.10719045
Zhang JXiao LLi MMeng ZLi Y(2024)HYDBre: A Hybrid Retrieval Method for Detecting Duplicate Software Bug Reports2024 11th International Conference on Dependable Systems and Their Applications (DSA)10.1109/DSA63982.2024.00040(242-251)Online publication date: 2-Nov-2024
https://doi.org/10.1109/DSA63982.2024.00040
Zheng WLi YWu XCheng J(2024)Duplicate Bug Report detection using Named Entity RecognitionKnowledge-Based Systems10.1016/j.knosys.2023.111258284(111258)Online publication date: Jan-2024
https://doi.org/10.1016/j.knosys.2023.111258
Messaoud MChekaya RMkaouer MJenhani IAljedaani W(2024)PR-DupliChecker: detecting duplicate pull requests in Fork-based workflowsInternational Journal of System Assurance Engineering and Management10.1007/s13198-024-02361-415:7(3538-3550)Online publication date: 19-Jun-2024
https://doi.org/10.1007/s13198-024-02361-4
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten