A contextual approach towards more accurate duplicate bug report detection and ranking

Hindle, Abram; Alipour, Anahita; Stroulia, Eleni

doi:10.1007/s10664-015-9387-3

A contextual approach towards more accurate duplicate bug report detection and ranking

Published: 28 June 2015

Volume 21, pages 368–410, (2016)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Abram Hindle¹,
Anahita Alipour¹ &
Eleni Stroulia¹

1246 Accesses
54 Citations
3 Altmetric
Explore all metrics

Abstract

The issue-tracking systems used by software projects contain issues, bugs, or tickets written by a wide variety of bug reporters, with different levels of training and knowledge about the system under development. Typically, reporters lack the skills and/or time to search the issue-tracking system for similar issues already reported. As a result, many reports end up referring to the same issue, which effectively makes the bug-report triaging process time consuming and error prone. Many researchers have approached the bug-deduplication problem using off-the-shelf information-retrieval (IR) tools. In this work, we extend the state of the art by investigating how contextual information about software-quality attributes, software-architecture terms, and system-development topics can be exploited to improve bug deduplication. We demonstrate the effectiveness of our contextual bug-deduplication method at ranking duplicates on the bug repositories of the Android, Eclipse, Mozilla, and OpenOffice software systems. Based on this experience, we conclude that taking into account domain-specific context can improve IR methods for bug deduplication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Preventing duplicate bug reports by continuously querying bug reports

Article 20 August 2018

On the relationship between bug reports and queries for text retrieval-based bug localization

Article 13 July 2020

An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects

Article 28 January 2022

References

Karan A, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: Guéhéneuc Y-G, Adams B, Serebrenik A (eds) 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015, Montreal, QC, Canada, March 2-6, 2015, pp 211–220. IEEE
Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the Tenth International Workshop on Mining Software Repositories, pp 183–192. IEEE Press
Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, pp 35–39. ACM
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug?. In: Proceedings of the 28th international conference on Software engineering, pp 361–370. ACM
Ayewah N, Pugh W (2010) The google findbugs fixit. In: Proceedings of the 19th international symposium on Software testing and analysis, pp 241–252. ACM
Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful really?. In: 2008 IEEE International Conference on Software Maintenance, ICSM 2008, pp 337–345 . IEEE
Brown A, Wilson G (2011) The Architecture Of Open Source Applications. lulu.com
Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. ACM, New York, pp 33–40
Google Scholar
Android Community (2013) Android Technical Information. http://source.android.com/tech/security/
Ernst NA, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: Wieringa R, Persson A (eds) Requirements Engineering: Foundation for Software Quality, volume 6182 of Lecture Notes in Computer Science, pp 143–157. Springer, Berlin
Grosskurth A, Godfrey MW (2006) Architecture and evolution of the modern web browser. Preprint submitted to Elsevier Science
Guana V, Rocha F, Hindle A, Stroulia E (2012) Do the stars align? Multidimensional analysis of Android’s layered architecture. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 124–127. IEEE
Han D, Zhang C, Fan X, Hindle A, Wong K, Stroulia E (2012) Understanding android fragmentation with topic analysis of vendor-specific bugs. IEEE
Hangal S, Lam MS (2002) Tracking down software bugs using automatic anomaly detection. In: Proceedings of the 24th international conference on Software engineering, pp 291–301. ACM
Hiew L (2006) Assisted detection of duplicate bug reports. PhD thesis, The University Of British Columbia
Hindle A, Ernst NA, Godfrey MW, Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. In: Proceedings of the 8th Working Conference on Mining Software Repositories, pp 163–172. ACM
Holmes G, Donkin A, Witten IH (1994) Weka: A machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, 1994, pp 357–361. IEEE
Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC, DSN 2008, pp 52–61. IEEE
Kayed A, Hirzalla N, Samhan AA, Alfayoumi M (2009) Towards an ontology for software product quality attributes. In: ICIW’09 Fourth International Conference on Internet and Web Applications and Services, 2009, pp 200–204. IEEE
Langford J, Li L, Strehl A (2007) Vowpal wabbit online learning project
Sun Microsystems (2000) The openoffice.org source project: Technical overview. http://www.immagic.com/eLibrary/ARCHIVES/GENERAL/SUN/OPENOFCT.pdf
Monard MC, Batista GE (2002) Learning with skewed class distrihutions. Advances in Logic, Artificial Intelligence, and Robotics: LAPTEC 2002 85:173
Google Scholar
Nagwani NK, Singh P (2009) Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs. In: Proceedings of the International Conference on Advances in Computing, Communication and Control, pp 202–207. ACM
Nakashima T, Oyama M, Hisada H, Ishii N (1999) Analysis of software bug causes and its prevention. Inf Softw Technol 41(15):1059–1068
Article Google Scholar
Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp 70–79. ACM
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora
Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M et al (1995) Okapi at trec-3. NIST SPECIAL PUBLICATION SP:109–109
Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: 2007 29th International Conference on Software Engineering, ICSE 2007, pp 499–510. IEEE
Serrano N, Ciordia I (2005) Bugzilla, ITracker, and other bug trackers. IEEE Softw 22(2):11–13
Article Google Scholar
Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pp 253–262. IEEE Computer Society
Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pp 45–54. ACM
Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: 2010 17th Asia Pacific Software Engineering Conference (APSEC), pp 366–374. IEEE
Hulse JV, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on Machine learning, pp 935–942. ACM
Wallace BC, Dahabreh IJ (2012) Class probability estimates are unreliable for imbalanced data (and how to fix them). In: ICDM, pp 695–704
Wallace BC, Small K, Brodley CE, Trikalinos TA (2011) Class imbalance, redux. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp 754–763. IEEE
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on Software engineering, pp 461–470. ACM
Zaragoza H, Craswell N, Taylor MJ, Saria S, Robertson SE (2004) Microsoft cambridge at trec 13: Web and hard tracks. In: TREC, vol, 4, pp 1–1. Citeseer

Download references

Acknowledgments

We would like to thank Sun et al. (2011) for sharing their Eclipse, OpenOffice, and Mozilla datasets with us. Abram Hindle and Eleni Stroulia were supported by NSERC Discovery Grants.

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Edmonton, AB, Canada
Abram Hindle, Anahita Alipour & Eleni Stroulia

Authors

Abram Hindle
View author publications
You can also search for this author in PubMed Google Scholar
Anahita Alipour
View author publications
You can also search for this author in PubMed Google Scholar
Eleni Stroulia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abram Hindle.

Additional information

Communicated by: Massimiliano Di Penta and Sung Kim

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hindle, A., Alipour, A. & Stroulia, E. A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Software Eng 21, 368–410 (2016). https://doi.org/10.1007/s10664-015-9387-3

Download citation

Published: 28 June 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s10664-015-9387-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A contextual approach towards more accurate duplicate bug report detection and ranking

Abstract

Access this article

Similar content being viewed by others

Preventing duplicate bug reports by continuously querying bug reports

On the relationship between bug reports and queries for text retrieval-based bug localization

An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A contextual approach towards more accurate duplicate bug report detection and ranking

Abstract

Access this article

Similar content being viewed by others

Preventing duplicate bug reports by continuously querying bug reports

On the relationship between bug reports and queries for text retrieval-based bug localization

An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation