Skip to main content
Log in

A contextual approach towards more accurate duplicate bug report detection and ranking

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The issue-tracking systems used by software projects contain issues, bugs, or tickets written by a wide variety of bug reporters, with different levels of training and knowledge about the system under development. Typically, reporters lack the skills and/or time to search the issue-tracking system for similar issues already reported. As a result, many reports end up referring to the same issue, which effectively makes the bug-report triaging process time consuming and error prone. Many researchers have approached the bug-deduplication problem using off-the-shelf information-retrieval (IR) tools. In this work, we extend the state of the art by investigating how contextual information about software-quality attributes, software-architecture terms, and system-development topics can be exploited to improve bug deduplication. We demonstrate the effectiveness of our contextual bug-deduplication method at ranking duplicates on the bug repositories of the Android, Eclipse, Mozilla, and OpenOffice software systems. Based on this experience, we conclude that taking into account domain-specific context can improve IR methods for bug deduplication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  • Karan A, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: Guéhéneuc Y-G, Adams B, Serebrenik A (eds) 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015, Montreal, QC, Canada, March 2-6, 2015, pp 211–220. IEEE

  • Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the Tenth International Workshop on Mining Software Repositories, pp 183–192. IEEE Press

  • Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, pp 35–39. ACM

  • Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug?. In: Proceedings of the 28th international conference on Software engineering, pp 361–370. ACM

  • Ayewah N, Pugh W (2010) The google findbugs fixit. In: Proceedings of the 19th international symposium on Software testing and analysis, pp 241–252. ACM

  • Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful really?. In: 2008 IEEE International Conference on Software Maintenance, ICSM 2008, pp 337–345 . IEEE

  • Brown A, Wilson G (2011) The Architecture Of Open Source Applications. lulu.com

  • Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. ACM, New York, pp 33–40

    Google Scholar 

  • Android Community (2013) Android Technical Information. http://source.android.com/tech/security/

  • Ernst NA, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: Wieringa R, Persson A (eds) Requirements Engineering: Foundation for Software Quality, volume 6182 of Lecture Notes in Computer Science, pp 143–157. Springer, Berlin

  • Grosskurth A, Godfrey MW (2006) Architecture and evolution of the modern web browser. Preprint submitted to Elsevier Science

  • Guana V, Rocha F, Hindle A, Stroulia E (2012) Do the stars align? Multidimensional analysis of Android’s layered architecture. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 124–127. IEEE

  • Han D, Zhang C, Fan X, Hindle A, Wong K, Stroulia E (2012) Understanding android fragmentation with topic analysis of vendor-specific bugs. IEEE

  • Hangal S, Lam MS (2002) Tracking down software bugs using automatic anomaly detection. In: Proceedings of the 24th international conference on Software engineering, pp 291–301. ACM

  • Hiew L (2006) Assisted detection of duplicate bug reports. PhD thesis, The University Of British Columbia

  • Hindle A, Ernst NA, Godfrey MW, Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. In: Proceedings of the 8th Working Conference on Mining Software Repositories, pp 163–172. ACM

  • Holmes G, Donkin A, Witten IH (1994) Weka: A machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, 1994, pp 357–361. IEEE

  • Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC, DSN 2008, pp 52–61. IEEE

  • Kayed A, Hirzalla N, Samhan AA, Alfayoumi M (2009) Towards an ontology for software product quality attributes. In: ICIW’09 Fourth International Conference on Internet and Web Applications and Services, 2009, pp 200–204. IEEE

  • Langford J, Li L, Strehl A (2007) Vowpal wabbit online learning project

  • Sun Microsystems (2000) The openoffice.org source project: Technical overview. http://www.immagic.com/eLibrary/ARCHIVES/GENERAL/SUN/OPENOFCT.pdf

  • Monard MC, Batista GE (2002) Learning with skewed class distrihutions. Advances in Logic, Artificial Intelligence, and Robotics: LAPTEC 2002 85:173

    Google Scholar 

  • Nagwani NK, Singh P (2009) Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs. In: Proceedings of the International Conference on Advances in Computing, Communication and Control, pp 202–207. ACM

  • Nakashima T, Oyama M, Hisada H, Ishii N (1999) Analysis of software bug causes and its prevention. Inf Softw Technol 41(15):1059–1068

    Article  Google Scholar 

  • Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp 70–79. ACM

  • Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora

  • Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M et al (1995) Okapi at trec-3. NIST SPECIAL PUBLICATION SP:109–109

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: 2007 29th International Conference on Software Engineering, ICSE 2007, pp 499–510. IEEE

  • Serrano N, Ciordia I (2005) Bugzilla, ITracker, and other bug trackers. IEEE Softw 22(2):11–13

    Article  Google Scholar 

  • Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pp 253–262. IEEE Computer Society

  • Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pp 45–54. ACM

  • Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: 2010 17th Asia Pacific Software Engineering Conference (APSEC), pp 366–374. IEEE

  • Hulse JV, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on Machine learning, pp 935–942. ACM

  • Wallace BC, Dahabreh IJ (2012) Class probability estimates are unreliable for imbalanced data (and how to fix them). In: ICDM, pp 695–704

  • Wallace BC, Small K, Brodley CE, Trikalinos TA (2011) Class imbalance, redux. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp 754–763. IEEE

  • Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on Software engineering, pp 461–470. ACM

  • Zaragoza H, Craswell N, Taylor MJ, Saria S, Robertson SE (2004) Microsoft cambridge at trec 13: Web and hard tracks. In: TREC, vol, 4, pp 1–1. Citeseer

Download references

Acknowledgments

We would like to thank Sun et al. (2011) for sharing their Eclipse, OpenOffice, and Mozilla datasets with us. Abram Hindle and Eleni Stroulia were supported by NSERC Discovery Grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abram Hindle.

Additional information

Communicated by: Massimiliano Di Penta and Sung Kim

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hindle, A., Alipour, A. & Stroulia, E. A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Software Eng 21, 368–410 (2016). https://doi.org/10.1007/s10664-015-9387-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-015-9387-3

Keywords

Navigation