ABSTRACT
Developers rely on bug reports to fix bugs. The bug reports are usually stored and managed in bug tracking systems. Due to the different expression habits, different reporters may use different expressions to describe the same bug in the bug tracking system. As a result, the bug tracking system often contains many duplicate bug reports. Automatically detecting these duplicate bug reports would save a large amount of effort for bug analysis. Prior studies have found that deep-learning technique is effective for duplicate bug report detection. Inspired by recent Natural Language Processing (NLP) research, in this paper, we propose a duplicate bug report detection approach based on Dual-Channel Convolutional Neural Networks (DC-CNN). We present a novel bug report pair representation, i.e., dual-channel matrix through concatenating two single-channel matrices representing bug reports. Such bug report pairs are fed to a CNN model to capture the correlated semantic relationships between bug reports. Then, our approach uses the association features to classify whether a pair of bug reports are duplicate or not. We evaluate our approach on three large datasets from three open-source projects, including Open Office, Eclipse, Net Beans and a larger combined dataset, and the accuracy of classification reaches 0.9429, 0.9685, 0.9534, 0.9552 respectively. Such performance outperforms the two state-of-the-art approaches which also use deep-learning techniques. The results indicate that our dual-channel matrix representation is effective for duplicate bug report detection.
- Karan Aggarwal, Finbarr Timbers, Tanner Rutgers, Abram Hindle, Eleni Stroulia, and Russell Greiner. 2017. Detecting duplicate bug reports with software engineering domain knowledge. Journal of Software: Evolution and Process 29, 3 (2017), e1821.Google ScholarCross Ref
- Anahita Alipour, Abram Hindle, and Eleni Stroulia. 2013. A contextual approach towards more accurate duplicate bug report detection. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 183--192.Google ScholarCross Ref
- John Anvik, Lyndon Hiew, and Gail C Murphy. 2005. Coping with an open bug repository. In Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange. ACM, 35--39.Google ScholarDigital Library
- Prasad V Bagal, Sameer Arun Joshi, Hanlin Daniel Chien, Ricardo Rey Diez, David Cavazos Woo, Emily Ronshien Su, and Sha Chang. 2019. Duplicate bug report detection using machine learning algorithms and automated feedback incorporation. US Patent App. 16/383,405.Google Scholar
- Andrzej Białecki, Robert Muir, Grant Ingersoll, and Lucid Imagination. 2012. Apache lucene 4. In SIGIR 2012 workshop on open source information retrieval. 17.Google Scholar
- David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.Google ScholarDigital Library
- Satya Prateek Bommaraju, Anjaneyulu Pasala, and Shivani Rao. 2018. System and method for detection of duplicate bug reports. US Patent 9,990,268.Google Scholar
- Amar Budhiraja, Kartik Dutta, Raghu Reddy, and Manish Shrivastava. 2018. DWEN: deep word embedding network for duplicate bug report detection in software repositories. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 193--194.Google ScholarDigital Library
- Amar Budhiraja, Kartik Dutta, Manish Shrivastava, and Raghu Reddy. 2018. Towards Word Embeddings for Improved Duplicate Bug Report Retrieval in Software Repositories. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval. ACM, 167--170.Google ScholarDigital Library
- Yguarata Cerqueira Cavalcanti, Eduardo Santana de Almeida, Carlos Eduardo Albuquerque da Cunha, Daniel Lucredio, and Silvio Romero de Lemos Meira. 2010. An initial study on the bug report duplication problem. In 2010 14th European Conference on Software Maintenance and Reengineering. IEEE, 264--267.Google ScholarDigital Library
- Catarina Costa, Jair Figueiredo, Leonardo Murta, and Anita Sarma. 2016. TIP-Merge: recommending experts for integrating changes across branches. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 523--534.Google ScholarDigital Library
- Jayati Deshmukh, Sanjay Podder, Shubhashis Sengupta, Neville Dubash, et al. 2017. Towards accurate duplicate bug retrieval using deep learning techniques. In 2017 IEEE International conference on software maintenance and evolution (ICSME). IEEE, 115--124.Google ScholarCross Ref
- Yuanrui Fan, Xia Xin, Lo David, and Hassan Ahmed E. [n. d.]. Chaff from the Wheat: Characterizing and Determining Valid Bug Reports. IEEE Transactions on Software Engineering ([n. d.]), 1--1.Google Scholar
- Ying Fu, Meng Yan, Xiaohong Zhang, Ling Xu, Dan Yang, and Jeffrey D Kymer. 2015. Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation. Information and Software Technology 57 (2015), 369--377.Google ScholarCross Ref
- Lyndon Hiew. 2006. Assisted detection of duplicate bug reports. Ph.D. Dissertation. University of British Columbia.Google Scholar
- Pieter Hooimeijer and Westley Weimer. 2007. Modeling bug report quality. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 34--43.Google ScholarDigital Library
- Nicholas Jalbert and Westley Weimer. 2008. Automated duplicate detection for bug tracking systems. In 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). IEEE, 52--61.Google ScholarCross Ref
- Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Generating duplicate bug datasets. In Proceedings of the 11th working conference on mining software repositories. ACM, 392--395.Google ScholarDigital Library
- Hoa T Le, Christophe Cerisara, and Alexandre Denis. 2018. Do convolutional networks need to be deep for text classification?. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar
- Dean Lee, Vincent Siu, Rick Cruz, and Charles Yetman. 2016. Convolutional neural net and bearing fault analysis. In Proceedings of the International Conference on Data Mining (DMIN). The Steering Committee of The World Congress in Computer Science, Computer ..., 194.Google Scholar
- Johannes Lerch and Mira Mezini. 2013. Finding duplicates of your yet unwritten bug report. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 69--78.Google ScholarDigital Library
- Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, 136--140.Google ScholarCross Ref
- Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016).Google ScholarDigital Library
- Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, and Gongyi Wu. 2004. Improving text classification using local latent semantic indexing. In Fourth IEEE International Conference on Data Mining (ICDM'04). IEEE, 162--169.Google ScholarDigital Library
- Tie-Yan Liu et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3 (2009), 225--331.Google Scholar
- Marc Moreno Lopez and Jugal Kalita. 2017. Deep Learning applied to NLP. arXiv preprint arXiv:1703.03091 (2017).Google Scholar
- Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, David Lo, and Chengnian Sun. 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM, 70--79.Google ScholarDigital Library
- Xin Rong. 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014).Google Scholar
- Per Runeson, Magnus Alexandersson, and Oskar Nyholm. 2007. Detection of duplicate defect reports using natural language processing. In Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, 499--510.Google ScholarDigital Library
- Nicolas Serrano and Ismael Ciordia. 2005. Bugzilla, ITracker, and other bug trackers. IEEE software 22, 2 (2005), 11--13.Google Scholar
- Chengnian Sun, David Lo, Siau-Cheng Khoo, and Jing Jiang. 2011. Towards more accurate retrieval of duplicate bug reports. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 253--262.Google ScholarDigital Library
- Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang, and Siau-Cheng Khoo. 2010. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 45--54.Google ScholarDigital Library
- Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM neural networks for language modeling. In Thirteenth annual conference of the international speech communication association.Google Scholar
- Ashish Sureka and Pankaj Jalote. 2010. Detecting duplicate bug report using character n-gram-based features. In 2010 Asia Pacific Software Engineering Conference. IEEE, 366--374.Google ScholarDigital Library
- D Swapna and K Thammi Reddy. 2016. Duplicate Bug Report Detection of User Interface Bugs using Decision Tree Induction and Inverted Index Structure. (2016).Google Scholar
- Yuan Tian, Chengnian Sun, and David Lo. 2012. Improved duplicate bug report identification. In 2012 16th European Conference on Software Maintenance and Reengineering. IEEE, 385--390.Google ScholarDigital Library
- Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. 2008. An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 30th international conference on Software engineering. ACM, 461--470.Google ScholarDigital Library
- Meng Yan, Ying Fu, Xiaohong Zhang, Dan Yang, Ling Xu, and Jeffrey D Kymer. 2016. Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project. Journal of Systems and Software 113 (2016), 296--308.Google ScholarDigital Library
- Meng Yan, Xiaohong Zhang, Dan Yang, Ling Xu, and Jeffrey D Kymer. 2016. A component recommender for bug reports using discriminative probability latent semantic analysis. Information and Software Technology 73 (2016), 37--51.Google ScholarDigital Library
- Wen Zhang, Taketoshi Yoshida, and Xijin Tang. 2008. Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21, 8 (2008), 879--886.Google ScholarDigital Library
- Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649--657.Google Scholar
- Jian Zhou and Hongyu Zhang. 2012. Learning to rank duplicate bug reports. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 852--861.Google ScholarDigital Library
- Jie Zou, Ling Xu, Mengning Yang, Meng Yan, Dan Yang, and Xiaohong Zhang. 2016. Duplication Detection for Software Bug Reports based on Topic Model. In 2016 9th International Conference on Service Science (ICSS). IEEE, 60--65.Google ScholarCross Ref
- Jie Zou, Ling Xu, Mengning Yang, Xiaohong Zhang, Jun Zeng, and Sachio Hirokawa. 2016. Automated duplicate bug report detection using multi-factor analysis. IEICE TRANSACTIONS on Information and Systems 99, 7 (2016), 1762--1775.Google ScholarCross Ref
Index Terms
- Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks
Recommendations
Improving bug reporting, duplicate detection, and localization
ICSE-C '17: Proceedings of the 39th International Conference on Software Engineering CompanionSoftware developers rely on essential textual information from bug reports (such as Observed Behavior, Expected Behavior, and Steps to Reproduce) to triage and fix software bugs. Unfortunately, while relevant and useful, this information is often ...
Improved Duplicate Bug Report Identification
CSMR '12: Proceedings of the 2012 16th European Conference on Software Maintenance and ReengineeringBugs are prevalent in software systems. To improve the reliability of software systems, developers often allow end users to provide feedback on bugs that they encounter. Users could perform this by sending a bug report in a bug report management system ...
Duplicate Bug Report detection using Named Entity Recognition
AbstractSoftware bugs pose significant challenges in management. The Bug Tracking System (BTS) serves as a standard platform to chronicle, oversee, and manage bugs throughout software development and maintenance. While BTS aggregates numerous bug reports ...
Comments