skip to main content
10.1145/3387904.3389263acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks

Authors Info & Claims
Published:12 September 2020Publication History

ABSTRACT

Developers rely on bug reports to fix bugs. The bug reports are usually stored and managed in bug tracking systems. Due to the different expression habits, different reporters may use different expressions to describe the same bug in the bug tracking system. As a result, the bug tracking system often contains many duplicate bug reports. Automatically detecting these duplicate bug reports would save a large amount of effort for bug analysis. Prior studies have found that deep-learning technique is effective for duplicate bug report detection. Inspired by recent Natural Language Processing (NLP) research, in this paper, we propose a duplicate bug report detection approach based on Dual-Channel Convolutional Neural Networks (DC-CNN). We present a novel bug report pair representation, i.e., dual-channel matrix through concatenating two single-channel matrices representing bug reports. Such bug report pairs are fed to a CNN model to capture the correlated semantic relationships between bug reports. Then, our approach uses the association features to classify whether a pair of bug reports are duplicate or not. We evaluate our approach on three large datasets from three open-source projects, including Open Office, Eclipse, Net Beans and a larger combined dataset, and the accuracy of classification reaches 0.9429, 0.9685, 0.9534, 0.9552 respectively. Such performance outperforms the two state-of-the-art approaches which also use deep-learning techniques. The results indicate that our dual-channel matrix representation is effective for duplicate bug report detection.

References

  1. Karan Aggarwal, Finbarr Timbers, Tanner Rutgers, Abram Hindle, Eleni Stroulia, and Russell Greiner. 2017. Detecting duplicate bug reports with software engineering domain knowledge. Journal of Software: Evolution and Process 29, 3 (2017), e1821.Google ScholarGoogle ScholarCross RefCross Ref
  2. Anahita Alipour, Abram Hindle, and Eleni Stroulia. 2013. A contextual approach towards more accurate duplicate bug report detection. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 183--192.Google ScholarGoogle ScholarCross RefCross Ref
  3. John Anvik, Lyndon Hiew, and Gail C Murphy. 2005. Coping with an open bug repository. In Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange. ACM, 35--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Prasad V Bagal, Sameer Arun Joshi, Hanlin Daniel Chien, Ricardo Rey Diez, David Cavazos Woo, Emily Ronshien Su, and Sha Chang. 2019. Duplicate bug report detection using machine learning algorithms and automated feedback incorporation. US Patent App. 16/383,405.Google ScholarGoogle Scholar
  5. Andrzej Białecki, Robert Muir, Grant Ingersoll, and Lucid Imagination. 2012. Apache lucene 4. In SIGIR 2012 workshop on open source information retrieval. 17.Google ScholarGoogle Scholar
  6. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Satya Prateek Bommaraju, Anjaneyulu Pasala, and Shivani Rao. 2018. System and method for detection of duplicate bug reports. US Patent 9,990,268.Google ScholarGoogle Scholar
  8. Amar Budhiraja, Kartik Dutta, Raghu Reddy, and Manish Shrivastava. 2018. DWEN: deep word embedding network for duplicate bug report detection in software repositories. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 193--194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Amar Budhiraja, Kartik Dutta, Manish Shrivastava, and Raghu Reddy. 2018. Towards Word Embeddings for Improved Duplicate Bug Report Retrieval in Software Repositories. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval. ACM, 167--170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yguarata Cerqueira Cavalcanti, Eduardo Santana de Almeida, Carlos Eduardo Albuquerque da Cunha, Daniel Lucredio, and Silvio Romero de Lemos Meira. 2010. An initial study on the bug report duplication problem. In 2010 14th European Conference on Software Maintenance and Reengineering. IEEE, 264--267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Catarina Costa, Jair Figueiredo, Leonardo Murta, and Anita Sarma. 2016. TIP-Merge: recommending experts for integrating changes across branches. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 523--534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jayati Deshmukh, Sanjay Podder, Shubhashis Sengupta, Neville Dubash, et al. 2017. Towards accurate duplicate bug retrieval using deep learning techniques. In 2017 IEEE International conference on software maintenance and evolution (ICSME). IEEE, 115--124.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yuanrui Fan, Xia Xin, Lo David, and Hassan Ahmed E. [n. d.]. Chaff from the Wheat: Characterizing and Determining Valid Bug Reports. IEEE Transactions on Software Engineering ([n. d.]), 1--1.Google ScholarGoogle Scholar
  14. Ying Fu, Meng Yan, Xiaohong Zhang, Ling Xu, Dan Yang, and Jeffrey D Kymer. 2015. Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation. Information and Software Technology 57 (2015), 369--377.Google ScholarGoogle ScholarCross RefCross Ref
  15. Lyndon Hiew. 2006. Assisted detection of duplicate bug reports. Ph.D. Dissertation. University of British Columbia.Google ScholarGoogle Scholar
  16. Pieter Hooimeijer and Westley Weimer. 2007. Modeling bug report quality. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 34--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Nicholas Jalbert and Westley Weimer. 2008. Automated duplicate detection for bug tracking systems. In 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). IEEE, 52--61.Google ScholarGoogle ScholarCross RefCross Ref
  18. Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Generating duplicate bug datasets. In Proceedings of the 11th working conference on mining software repositories. ACM, 392--395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hoa T Le, Christophe Cerisara, and Alexandre Denis. 2018. Do convolutional networks need to be deep for text classification?. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  20. Dean Lee, Vincent Siu, Rick Cruz, and Charles Yetman. 2016. Convolutional neural net and bearing fault analysis. In Proceedings of the International Conference on Data Mining (DMIN). The Steering Committee of The World Congress in Computer Science, Computer ..., 194.Google ScholarGoogle Scholar
  21. Johannes Lerch and Mira Mezini. 2013. Finding duplicates of your yet unwritten bug report. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 69--78.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, 136--140.Google ScholarGoogle ScholarCross RefCross Ref
  23. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, and Gongyi Wu. 2004. Improving text classification using local latent semantic indexing. In Fourth IEEE International Conference on Data Mining (ICDM'04). IEEE, 162--169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tie-Yan Liu et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3 (2009), 225--331.Google ScholarGoogle Scholar
  26. Marc Moreno Lopez and Jugal Kalita. 2017. Deep Learning applied to NLP. arXiv preprint arXiv:1703.03091 (2017).Google ScholarGoogle Scholar
  27. Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, David Lo, and Chengnian Sun. 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM, 70--79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xin Rong. 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014).Google ScholarGoogle Scholar
  29. Per Runeson, Magnus Alexandersson, and Oskar Nyholm. 2007. Detection of duplicate defect reports using natural language processing. In Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, 499--510.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nicolas Serrano and Ismael Ciordia. 2005. Bugzilla, ITracker, and other bug trackers. IEEE software 22, 2 (2005), 11--13.Google ScholarGoogle Scholar
  31. Chengnian Sun, David Lo, Siau-Cheng Khoo, and Jing Jiang. 2011. Towards more accurate retrieval of duplicate bug reports. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 253--262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang, and Siau-Cheng Khoo. 2010. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 45--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM neural networks for language modeling. In Thirteenth annual conference of the international speech communication association.Google ScholarGoogle Scholar
  34. Ashish Sureka and Pankaj Jalote. 2010. Detecting duplicate bug report using character n-gram-based features. In 2010 Asia Pacific Software Engineering Conference. IEEE, 366--374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D Swapna and K Thammi Reddy. 2016. Duplicate Bug Report Detection of User Interface Bugs using Decision Tree Induction and Inverted Index Structure. (2016).Google ScholarGoogle Scholar
  36. Yuan Tian, Chengnian Sun, and David Lo. 2012. Improved duplicate bug report identification. In 2012 16th European Conference on Software Maintenance and Reengineering. IEEE, 385--390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. 2008. An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 30th international conference on Software engineering. ACM, 461--470.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Meng Yan, Ying Fu, Xiaohong Zhang, Dan Yang, Ling Xu, and Jeffrey D Kymer. 2016. Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project. Journal of Systems and Software 113 (2016), 296--308.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Meng Yan, Xiaohong Zhang, Dan Yang, Ling Xu, and Jeffrey D Kymer. 2016. A component recommender for bug reports using discriminative probability latent semantic analysis. Information and Software Technology 73 (2016), 37--51.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wen Zhang, Taketoshi Yoshida, and Xijin Tang. 2008. Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21, 8 (2008), 879--886.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649--657.Google ScholarGoogle Scholar
  42. Jian Zhou and Hongyu Zhang. 2012. Learning to rank duplicate bug reports. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 852--861.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jie Zou, Ling Xu, Mengning Yang, Meng Yan, Dan Yang, and Xiaohong Zhang. 2016. Duplication Detection for Software Bug Reports based on Topic Model. In 2016 9th International Conference on Service Science (ICSS). IEEE, 60--65.Google ScholarGoogle ScholarCross RefCross Ref
  44. Jie Zou, Ling Xu, Mengning Yang, Xiaohong Zhang, Jun Zeng, and Sachio Hirokawa. 2016. Automated duplicate bug report detection using multi-factor analysis. IEICE TRANSACTIONS on Information and Systems 99, 7 (2016), 1762--1775.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICPC '20: Proceedings of the 28th International Conference on Program Comprehension
      July 2020
      481 pages
      ISBN:9781450379588
      DOI:10.1145/3387904

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 September 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader