Using text clustering to predict defect resolution time: a conceptual replication and an evaluation of prediction accuracy

Assar, Saïd; Borg, Markus; Pfahl, Dietmar

doi:10.1007/s10664-015-9391-7

Using text clustering to predict defect resolution time: a conceptual replication and an evaluation of prediction accuracy

Published: 27 June 2015

Volume 21, pages 1437–1475, (2016)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Saïd Assar¹,
Markus Borg² &
Dietmar Pfahl³

973 Accesses
14 Citations
4 Altmetric
Explore all metrics

Abstract

Defect management is a central task in software maintenance. When a defect is reported, appropriate resources must be allocated to analyze and resolve the defect. An important issue in resource allocation is the estimation of Defect Resolution Time (DRT). Prior research has considered different approaches for DRT prediction exploiting information retrieval techniques and similarity in textual defect descriptions. In this article, we investigate the potential of text clustering for DRT prediction. We build on a study published by Raja (2013) which demonstrated that clusters of similar defect reports had statistically significant differences in DRT. Raja’s study also suggested that this difference between clusters could be used for DRT prediction. Our aims are twofold: First, to conceptually replicate Raja’s study and to assess the repeatability of its results in different settings; Second, to investigate the potential of textual clustering of issue reports for DRT prediction with focus on accuracy. Using different data sets and a different text mining tool and clustering technique, we first conduct an independent replication of the original study. Then we design a fully automated prediction method based on clustering with a simulated test scenario to check the accuracy of our method. The results of our independent replication are comparable to those of the original study and we confirm the initial findings regarding significant differences in DRT between clusters of defect reports. However, the simulated test scenario used to assess our prediction method yields poor results in terms of DRT prediction accuracy. Although our replication confirms the main finding from the original study, our attempt to use text clustering as the basis for DRT prediction did not achieve practically useful levels of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AutoODC: Automated generation of orthogonal defect classifications

Article 03 June 2014

LiGuo Huang, Vincent Ng, … Jeff Tian

Using Cluster Analysis for Characteristics Detection in Software Defect Reports

The impact of context metrics on just-in-time defect prediction

Article 08 August 2019

Masanari Kondo, Daniel M. German, … Eun-Hye Choi

Notes

http://serg.cs.lth.se/research/experiment-packages/clustering-defect-reports/
The RapidMiner process is exported to a file available in the same repository as the raw data (http://serg.cs.lth.se/research/experiment-packages/clustering-defect-reports/)

References

AbdelMoez W, Kholief M, Elsalmy FM (2013) Improving bug fix-time prediction model by filtering out outliers. Proceedings of the Int’l Conf. on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE). IEEE Computer Society, pp 359–364. doi:10.1109/TAEECE.2013.6557301
Anbalagan P, Vouk M (2009) On predicting the time taken to correct bug reports in open source projects. Proceedings of the IEEE Int’l Conf. on Software Maintenance (ICSM’09). IEEE Computer Society, pp 523–526. doi:10.1109/ICSM.2009.5306337
Batet M (2011) Ontology-based semantic clustering. AI Commun 24:291–292. doi:10.3233/AIC-2011-0501
Google Scholar
Bettenburg N, Nagappan M, Hassan AE (2012) Think locally, act globally: Improving defect and effort prediction models. Proceedings of the 9th IEEE Working Conf. on Mining Software Repositories (MSR’12). IEEE Computer Society, pp 60–69. doi:10.1109/MSR.2012.6224300
Bhattacharya P, Neamtiu I (2011) Bug-fix time prediction models: can we do better? Proceedings of the 8th Working Conf. on Mining Software Repositories (MSR’11). ACM, New York, NY, USA, pp 207–210. doi:10.1145/1985441.1985472
Boehm B, Basili VR (2001) Software defect reduction top 10 list. Computer 34:135–137
Article Google Scholar
Borg M (2014) Embrace your issues: compassing the software engineering landscape using bug reports. Presented at the Doctoral Symposium, 29th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE’14), Sept. 15th, 2014, Västerås, Sweden
Borg M, Runeson P (2013) IR in software traceability: from a bird’s eye view. Proceedings of the 7th International Symposium on Empirical Software Engineering and Measurement (ESEM’13), pp. 243–246
Borg M, Gotel OCZ, Wnuk K (2013) Enabling traceability reuse for impact analyses: a feasibility study in a safety context. Proceedings of the Int’l Workshop on Traceability in Emerging Forms of Software Eng. (TEFSE’13). IEEE Computer Society, pp 72–78. doi:10.1109/TEFSE.2013.6620158
Borg M, Pfahl D, Runeson P (2013) Analyzing networks of issue reports. Proceedings of the 17th European Conf. on Software Maintenance and Reengineering (CSMR’13). pp 79–88
Bougie G, Treude C, German DM, Storey M (2010) A comparative exploration of FreeBSD bug lifetimes. Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR’10). IEEE Computer Society, pp 106–109. doi:10.1109/MSR.2010.5463291
Brooks A, Roper M, Wood M et al (2008) Replication’s role in software engineering. In: Shull F, Singer J, Sjøberg DIK (eds) Guide to advanced empirical software engineering. Springer, London, pp 365–379
Chapter Google Scholar
Carver JC, Juristo N, Baldassarre MT, Vegas S (2014) Replications of software engineering experiments. Empir Softw Eng 19:267–276. doi:10.1007/s10664-013-9290-8
Article Google Scholar
Chen T-H, Thomas SW, Nagappan M, Hassan A (2012) Explaining software defects using topic models. Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR’12). pp 189–198. doi:10.1109/MSR.2012.6224280
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR’10). pp 31–41. doi:10.1109/MSR.2010.5463279
Deerwester S, Dumais S, Furnas G, Landauer T, Harschman R (1990) Indexing by latent semantic indexing. J Am Soc Inf Sci 41(6):391–407
Article Google Scholar
Dietterich T (2002) Machine learning for sequential data: a review. structural, syntactic, and statistical pattern recognition – Proceedings of the Joint IAPR International Workshops SSPR 2002 and SPR 2002, pp. 15–30
Dubes R (1993) Cluster analysis and related issues. In: Chen C, Pau L Wang P (eds) Handbook of pattern recognition and computer vision, Chen C, Pau L Wang P (Eds.), World Scientific Publishing, pp. 3–32
Frost HR, Moore JH (2014) Optimization of gene set annotations via entropy minimization over variable clusters (EMVC). Bioinformatics btu110:1–9. doi:10.1093/bioinformatics/btu110
Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. Proceedings 2nd Int. Workshop on Recommendation Systems for Software Eng. (RSSE’10). ACM, New York, NY, USA, pp 52–56
Gómez OS, Juristo N, Vegas S (2014) Understanding replication of experiments in software engineering: a classification. Inf Softw Technol 56(8):1033–1048. doi:10.1016/j.infsof.2014.04.004
González-Barahona J, Robles G (2012) On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empir Softw Eng 17:75–89. doi:10.1007/s10664-011-9181-9
Article Google Scholar
Hassan A (2008) The road ahead for mining software repositories. Frontiers of Software Maintenance (FoSM 2008). IEEE Computer Society, pp 48–57.
Hofmann M, Klinkenberg R (2014) RapidMiner: data mining use cases and business analytics applications. CRC Press
IEC (2014) IEC 61511–1 ed1.0. http://webstore.iec.ch/webstore/webstore.nsf/artnum/031559
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31:651–666. doi:10.1016/j.patrec.2009.09.011
Article Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323. doi:10.1145/331499.331504
Article Google Scholar
Juristo N, Gómez OS (2012) Replication of software engineering experiments. In: Meyer B, Nordio M (eds) Empirical software engineering and verification. Springer, Berlin/Heidelberg, pp 60–88
Chapter Google Scholar
Keung J, Kitchenham B (2008) Experiments with analogy-X for software cost estimation. Proceedings of the 19th Australian Conf. on Software Engineering (ASWEC’08). pp 229–238
Kim S, Whitehead,Jr. EJ (2006) How long did it take to fix bugs? Proceedings of the IEEE Int. Workshop on Mining Software Repositories (MSR’06). ACM, New York, USA, pp 173–174
Kitchenham BA (2008) The role of replications in empirical software engineering—a word of warning. Empir Softw Eng 13:219–221. doi:10.1007/s10664-008-9061-0
Article Google Scholar
Kontostathis A (2007) Essential dimensions of Latent Semantic Indexing (LSI), Proceedings of the 40th Annual Hawaii International Conference on System Sciences (HICSS ’07), pp. 73–80. doi:10.1109/HICSS.2007.213
Lamkanfi A, Demeyer S (2012) Filtering bug reports for fix-time analysis. Proceedings of the 16th European Conf. on Software Maintenance and Reengineering (CSMR’12). IEEE Computer Society, pp 379–384. doi:10.1109/CSMR.2012.47
Laukkanen EI, Mäntylä MV (2011) Survey reproduction of defect reporting in industrial software development. Proceedings of the Int’l Symposium on Empirical Software Engineering and Measurement (ESEM’11). pp 197–206
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34:485–496. doi:10.1109/TSE.2008.35
Article Google Scholar
Lilliefors HW (1967) On the Kolmogorov-Smirnov test for normality with mean and variance unknown. J Am Stat Assoc 62:399–402
Article Google Scholar
Marks L, Zou Y, Hassan AE (2011) Studying the fix-time for bugs in large open source projects. Proceedings of the 7th Int. Conf. on Predictive Models in Software Engineering. ACM, New York, NY, USA, pp 11:1–11:8
Matejka J, Li W, Grossman T, Fitzmaurice G (2009) CommunityCommands: command recommendations for software applications, Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology (UIST’09), pp. 193–202.
McCandless M, Hatcher E, Gospodnetic O (2010) Lucene in action, second edition. Manning Publications Co., Greenwich, CT, USA. http://www.manning.com/hatcher3/
Menzies T, Shepperd M (2012) Special issue on repeatable results in software engineering prediction. Empir Softw Eng 17:1–17. doi:10.1007/s10664-011-9193-5
Article Google Scholar
Menzies T, Bird C, Zimmermann T, et al. (2011) The inductive software engineering manifesto: principles for industrial data mining. Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering. ACM, New York, NY, USA, pp 19–26
Menzies T, Butcher A, Marcus A, et al. (2011) Local vs. global models for effort estimation and defect prediction. 26th IEEE/ACM Int. Conf. on Automated Software Engineering (ASE). pp 343–351. doi:10.1109/ASE.2011.6100072
Miller J (2005) Replicating software engineering experiments: a poisoned chalice or the Holy Grail. Inf Softw Technol 47:233–244. doi:10.1016/j.infsof.2004.08.005
Article Google Scholar
Müller M, Pfahl D (2008) Simulation methods. In: Shull F, Singer J, Sjøberg DIK (eds) Guide to Advanced Empirical Software Engineering. Springer, London, pp 117–152
Chapter Google Scholar
Panjer LD (2007) Predicting eclipse bug lifetimes. Proceedings of the 4th IEEE Working Conf. on Mining Software Repositories (MSR’07). IEEE Computer Society, p 29. doi:10.1109/MSR.2007.25
Perry DE, Porter AA, Votta LG (2000) Empirical studies of software engineering: a roadmap. Proceedings of the Conference on The Future of Software Engineering. ACM, New York, NY, USA, pp 345–355
Raja U (2013) All complaints are not created equal: text analysis of open source software defect reports. Empir Softw Eng 18:117–138. doi:10.1007/s10664-012-9197-9
Robinson B, Francis P (2010) Improving industrial adoption of software engineering research: a comparison of open and closed source software. Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, New York, NY, USA, pp 21:1–21:10. doi:10.1145/1852786.1852814
Rosenthal R (1991) Replication in behavioral research. In: Neuliep JW (ed) Replication research in the social sciences. SAGE Publications Inc., Newbury Park, pp 1–30
Sawilowsky SS, Blair RC (1992) A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. Psychol Bull 111:352–360. doi:10.1037/0033-2909.111.2.352
Article Google Scholar
Shepperd M, Kadoda G (2001) Using simulation to evaluate prediction techniques [for software]. Proceedings of the 7th Int’l Software Metrics Symposium (METRICS 2001). IEEE Computer Society, pp 349–359
Shihab E, Kamei Y, Bhattacharya P (2012) Mining challenge 2012: the android platform. Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR’12). IEEE Computer Society, pp 112–115
Shull F, Carver J, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13:211–218. doi:10.1007/s10664-008-9060-1
Article Google Scholar
Singhal A (2001) Modern information retrieval: a brief overview. Data Eng Bull 24(2):1–9
Google Scholar
Strate JD, Laplante PA (2013) A literature review of research in software defect reporting. IEEE Trans Reliab 62:444–454. doi:10.1109/TR.2013.2259204
Article Google Scholar
Su T, Dy J (2004) A deterministic method for initializing K-means clustering. 16th IEEE Int. Conf. on Tools with Artificial Intelligence (ICTAI 2004). pp 784–786. doi:10.1109/ICTAI.2004.7
Tassey G (2002) The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology (NIST), USA
Google Scholar
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B (Stat Methodol) 63:411–423. doi:10.1111/1467-9868.00293
Article MathSciNet MATH Google Scholar
Walker R, Holmes R (2014) Simulation - a methodology to evaluate recommendation systems in software engineering. In: Robillard M, Maalej W, Walker R, Zimmermann T (eds) Recommendation systems in software engineering. Springer, London, pp 301–327
Wang D, Wang Q, Yang Y, et al. (2011) “Is it really a defect?” An empirical study on measuring and improving the process of software defect reporting. Proceedings of the Int’l Symposium on Empirical Software Engineering and Measurement (ESEM’11). IEEE Computer Society, pp 434–443. doi:10.1109/ESEM.2011.62
Weick KE (1995) What theory is not, theorizing is. Adm Sci Q 40:385–390
Article Google Scholar
Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How long will it take to fix this bug? Proceedings of the 4th Int. Workshop on Mining Software Repositories (MSR’07). IEEE Computer Society. doi:10.1109/MSR.2007.13
Wohlin C, Runeson P, Höst M et al (2012) Experimentation in software engineering. Springer, Berlin/Heidelberg
Book MATH Google Scholar
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678. doi:10.1109/TNN.2005.845141
Zeller A (2009) Why programs fail: a guide to systematic debugging, 2nd ed. Morgan Kaufmann Publishers.
Zhang H, Gong L, Versteeg S (2013) Predicting bug-fixing time: an empirical study of commercial software projects. Proceedings of the 2013 International Conf. on Software Eng. (ICSE’13). IEEE Computer Society, Piscataway, NJ, USA, pp 1042–1051. doi:10.1109/ICSE.2013.6606654

Download references

Acknowledgments

This research was partly funded by the institutional research grant IUT20-55 of the Estonian Research Council and the Industrial Excellence Center EASE – Embedded Applications Software Engineering, Sweden.

Author information

Authors and Affiliations

Ecole de Management, Institut Mines-Telecom, 9, rue C. Fourier, 91011, Evry, France
Saïd Assar
Department of Computer Science, Lund University, Box 118, SE-221 00, Lund, Sweden
Markus Borg
Institute of Computer Science, University of Tartu, J. Liivi 2, Tartu, 50409, Estonia
Dietmar Pfahl

Authors

Saïd Assar
View author publications
You can also search for this author in PubMed Google Scholar
Markus Borg
View author publications
You can also search for this author in PubMed Google Scholar
Dietmar Pfahl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saïd Assar.

Additional information

Communicated by: Thomas Zimmermann

Rights and permissions

Reprints and permissions

About this article

Cite this article

Assar, S., Borg, M. & Pfahl, D. Using text clustering to predict defect resolution time: a conceptual replication and an evaluation of prediction accuracy. Empir Software Eng 21, 1437–1475 (2016). https://doi.org/10.1007/s10664-015-9391-7

Download citation

Published: 27 June 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10664-015-9391-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using text clustering to predict defect resolution time: a conceptual replication and an evaluation of prediction accuracy

Abstract

Access this article

Similar content being viewed by others

AutoODC: Automated generation of orthogonal defect classifications

Using Cluster Analysis for Characteristics Detection in Software Defect Reports

The impact of context metrics on just-in-time defect prediction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using text clustering to predict defect resolution time: a conceptual replication and an evaluation of prediction accuracy

Abstract

Access this article

Similar content being viewed by others

AutoODC: Automated generation of orthogonal defect classifications

Using Cluster Analysis for Characteristics Detection in Software Defect Reports

The impact of context metrics on just-in-time defect prediction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation