Skip to main content
Log in

Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Issue tracking systems (ITSs) allow software end-users and developers to file issue reports and change requests. Reports are frequently duplicately filed for the same software issue. The retrieval of these duplicate issue reports is a tedious manual task. Prior research proposed several automated approaches for the retrieval of duplicate issue reports. Recent versions of ITSs added a feature that does basic retrieval of duplicate issue reports at the filing time of an issue report in an effort to avoid the filing of duplicates as early as possible. This paper investigates the impact of this just-in-time duplicate retrieval on the duplicate reports that end up in the ITS of an open source project. In particular, we study the differences between duplicate reports for open source projects before and after the activation of this new feature. We show how the experimental results of prior research would vary given the new data after the activation of the just-in-time duplicate retrieval feature. We study duplicate issue reports from the Mozilla-Firefox, Mozilla-Core and Eclipse-Platform projects. In addition, we compare the performance of the state of the art of the automated retrieval of duplicate reports using two popular approaches (i.e., BM25F and REP). We find that duplicate issue reports after the activation of the just-in-time duplicate retrieval feature are less textually similar, have a greater identification delay and require more discussion to be retrieved as duplicate reports than duplicates before the activation of the feature. Prior work showed that REP outperforms BM25F in terms of Recall rate and Mean average precision. We observe that the performance gap between BM25F and REP becomes even larger after the activation of the just-in-time duplicate retrieval feature. We recommend that future studies focus on duplicates that were reported after the activation of the just-in-time duplicate retrieval feature as these duplicates are more representative of future incoming issue reports and therefore, give a better representation of the future performance of proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.bugzilla.org/installation-list/

  2. http://dev.mysql.com/doc/internals/en/full-text-search.html

  3. http://dev.mysql.com/doc/internals/en/full-text-search.html

  4. https://github.com/bugzilla/bugzilla/blob/master/Bugzilla/Bug.pm#L599

  5. http://bugzilla.mozilla.org/show_bug.cgi?id=22353

  6. https://www.bugzilla.org/news/

  7. https://bugs.eclipse.org/bugs/show_bug.cgi?id=359299

  8. http://www.comp.nus.edu.sg/specmine/suncn/ase11/index.html

  9. Issue#393235: https://bugs.eclipse.org/bugs/show_bug.cgi?id=393235. We manually verified that this issue still persists.

  10. https://alm-help.saas.hpe.com/en/12.55/online_help/Content/UG/ui_similar_defects.htm

  11. https://www.computecanada.ca/

  12. http://cac.queensu.ca/

References

  • Aggarwal K, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: Proceedings of the 22th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, pp 211–220

  • Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pp 183–192

  • Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange (Eclipse). ACM, pp 35–39

  • Banerjee S, Syed Z, Helmick J, Culp M, Ryan K, Cukic B (2017) Automated triaging of very large bug repositories. Inf Softw Technol 89(Supplement C):1–13

    Article  Google Scholar 

  • Berry MW, Castellanos M (2004) Survey of text mining. Comput Rev 45(9):548

    Google Scholar 

  • Bettenburg N, Just S, Schröter A, Weiß C, Premraj R, Zimmermann T (2007) Quality of bug reports in eclipse. In: Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange (Eclipse). ACM, pp 21–25

  • Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT/FSE). ACM, pp 308–318

  • Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful...really? In: Proceedings of the 24th International Conference on Software Maintenance (ICSM). IEEE, pp 337–345

  • Borg M, Runeson P (2014) Changes, evolution, and bugs. Springer, Berlin, pp 477–509

    Google Scholar 

  • Borg M, Runeson P, Johansson J, Mäntylä MV (2014) A replicated study on duplicate detection: Using apache lucene to search among android defects. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ACM, New York, pp 8:1–8:4

  • Bugzilla Release notes for Bugzilla 4.0 (2017) https://www.bugzilla.org/releases/4.0/release-notes.html. Last visited on 11/12/2017

  • Cavalcanti YC, Neto PAdMS, Lucrédio D, Vale T, de Almeida ES, de Lemos Meira SR (2013) The bug report duplication problem: an exploratory study. Softw Qual J 21(1):39–66

    Article  Google Scholar 

  • Cavalcanti YC, da Mota Silveira Neto PA, Machado IdC, Vale TF, de Almeida ES, Meira SRdL (2014) Challenges and opportunities for software change request repositories: a systematic mapping study. J Softw Evol Process 26(7):620–653

    Article  Google Scholar 

  • Chowdhury G (2010) Introduction to modern information retrieval. Facet publishing, UK

    Google Scholar 

  • Gehan EA (1965) A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52(1-2):203–223

    Article  MathSciNet  MATH  Google Scholar 

  • Hamers L, Hemeryck Y, Herweyers G, Janssen M, Keters H, Rousseau R, Vanhoutte A (1989) Similarity measures in scientometric research: The Jaccard index versus Salton’s cosine formula. Inf Process Manag 25(3):315–318

    Article  Google Scholar 

  • Hassan AE (2008) The road ahead for mining software repositories. In: Proceedings of the Frontiers of Software Maintenance (FoSM). IEEE, pp 48–57

  • Hindle A (2016) Stopping duplicate bug reports before they start with Continuous Querying for bug reports. PeerJ Prepr 4:e2373v1

    Google Scholar 

  • Hindle A, Alipour A, Stroulia E (2016) A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Softw Eng 21(2):368–410

    Article  Google Scholar 

  • Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: Proceedings of the 38th International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). IEEE, pp 52–61

  • Jira Duplicate Detection (2017) https://marketplace.atlassian.com/plugins/com.deniz.jira.similarissues/server/overview. Last visited on 11/12/2017

  • Koponen T (2006) Life cycle of defects in open source software projects. In: Open Source Systems. Springer, pp 195–200

  • Lazar A, Ritchey S, Sharif B (2014) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR). ACM, pp 308–311

  • Long JD, Feng D, Cliff N (2003) Ordinal analysis of behavioral data. Handbook of psychology

  • Mantis Bug Tracker (2017) https://www.mantisbt.org/. Last visited on 11/12/2017

  • Nagwani NK, Singh P (2009) Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs. In: Proceedings of the 1st International Conference on Advances in Computing, Communication and Control (ICAC3). ACM, pp 202–207

  • Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE). ACM, pp 70–79

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 311–318

  • Rakha MS, Shang W, Hassan AE (2016) Studying the needed effort for identifying duplicate issues. Empir Softw Eng (EMSE) 21(5):1960–1989

    Article  Google Scholar 

  • Rakha MS, Bezemer CP, Hassan AE (2017) Revisiting the Performance of Automated Approaches for the Retrieval of Duplicate Reports in Issue Tracking Systems that Perform Just-in-Time Duplicate Retrieval: Online Appendix. https://github.com/SAILResearch/replication-jit_duplicates. Last visited on 11/12/2017

  • Rakha MS, Bezemer CP, Hassan AE (2017) Revisiting the performance evaluation of automated approaches for the retrieval of duplicate issue reports. IEEE Trans Softw Eng (TSE) PP(99):1–27

    Article  Google Scholar 

  • RedMine Flexible Project Management (2017) https://www.redmine.org/. Last visited on 11/12/2017

  • Robertson S, Zaragoza H, Taylor M (2004) Simple BM25 extension to multiple weighted fields. In: Proceedings of the 13th International Conference on Information and Knowledge Management (CIKM). ACM, pp 42–49

  • Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the nsse and other surveys: Are the t-test and Cohens’d indices the most appropriate choices. In: Annual Meeting of the Southern Association for Institutional Research

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th International Conference on Software Engineering (ICSE). IEEE Computer Society, pp 499–510

  • Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using Latent Dirichlet Allocation. In: Proceedings of the 5th India Software Engineering Conference (ISEC). ACM, pp 125–130

  • Strzalkowski T, Lin F, Wang J, Perez-Carballo J (1999) Evaluating natural language processing techniques in information retrieval. In: Natural language information retrieval. Springer, pp 113–145

  • Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32th ACM/IEEE International Conference on Software Engineering (ICSE). ACM, pp 45–54

  • Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp 253–262

  • Sun C, Le V, Zhang Q, Su Z (2016) Toward understanding compiler bugs in GCC and LLVM. In: Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA). ACM, New York, pp 294–305

  • Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: Proceedings of the 17th Asia Pacific Software Engineering Conference (APSEC). IEEE Computer Society, pp 366–374

  • Taylor M, Zaragoza H, Craswell N, Robertson S, Burges C (2006) Optimisation methods for ranking functions with multiple parameters. In: CIKM 2006: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, pp 585–593

  • The Trac Project (2017) https://trac.edgewall.org/. Last visited on 11/12/2017

  • Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering (ICSE). ACM, pp 461–470

  • Zhou J, Zhang H (2012) Learning to rank duplicate bug reports. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM). ACM, pp 852–861

  • Zou J, Xu L, Yang M, Zhang X, Zeng J, Hirokawa S (2016) Automated duplicate bug report detection using multi-factor analysis. IEICE Trans Inf Syst E99.D(7):1762–1775

    Article  Google Scholar 

Download references

Acknowledgments

This study would not have been possible without the High Performance Computing (HPC) systems that are shared by Compute CanadaFootnote 11 and the Center for Advanced ComputingFootnote 12 as well as the tools provided by Sun et al. (2011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Sami Rakha.

Additional information

Communicated by: Burak Turhan

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rakha, M.S., Bezemer, CP. & Hassan, A.E. Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval. Empir Software Eng 23, 2597–2621 (2018). https://doi.org/10.1007/s10664-017-9590-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9590-5

Keywords

Navigation