Time variance and defect prediction in software projects

Ekanayake, Jayalath; Tappolet, Jonas; Gall, Harald C.; Bernstein, Abraham

doi:10.1007/s10664-011-9180-x

Time variance and defect prediction in software projects

Towards an exploitation of periods of stability and change as well as a notion of concept drift in software projects

Published: 03 November 2011

Volume 17, pages 348–389, (2012)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Jayalath Ekanayake¹,
Jonas Tappolet¹,
Harald C. Gall² &
…
Abraham Bernstein¹

851 Accesses
22 Citations
Explore all metrics

Abstract

It is crucial for a software manager to know whether or not one can rely on a bug prediction model. A wrong prediction of the number or the location of future bugs can lead to problems in the achievement of a project’s goals. In this paper we first verify the existence of variability in a bug prediction model’s accuracy over time both visually and statistically. Furthermore, we explore the reasons for such a high variability over time, which includes periods of stability and variability of prediction quality, and formulate a decision procedure for evaluating prediction models before applying them. To exemplify our findings we use data from four open source projects and empirically identify various project features that influence the defect prediction quality. Specifically, we observed that a change in the number of authors editing a file and the number of defects fixed by them influence the prediction quality. Finally, we introduce an approach to estimate the accuracy of prediction models that helps a project manager decide when to rely on a prediction model. Our findings suggest that one should be aware of the periods of stability and variability of prediction quality and should use approaches such as ours to assess their models’ accuracy in advance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the time-based conclusion stability of cross-project defect prediction models

Article 09 September 2020

The impact of tangled code changes on defect prediction models

Article 16 April 2015

The role of bug report evolution in reliable fixing estimation

Article 20 September 2022

Notes

http://www.nongnu.org/cvs/
http://www.bugzilla.org/
E.g., http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/ or http://www.cs.columbia.edu/~eaddy/concerntagger/.
Tables can be found in Appendix A.
Note that a complete description can be found in Appendix B and that for all features where authorship is relevant it is determined as the person committing the code into the CVS rather than the developer noted in the comments of the code. However, most of active contributors are committers of a project. For example in the PDT project (http://www.eclipse.org/pdt/people/contributors.php#Seva-%28Wsevolod%29-Lapsha), out of 12 participants 11 of them are committers. Hence, this assumption will not have a great impact on the outcome of the experiments.
E.g. how the individual committers coding behavior synchronizes towards a milestone.
Note that we used the Mann–Whitney test as the test for normality (one-Sample Kolmogorov–Smirnov test: p = 0.055) produced a borderline result. As some still use the t-test for large collections of slightly non-parametric data we also ran an independent-sample t-test and found it to be significant at α = 0.001.
Like above a t-test reconfirmed these findings at α = 0.001.
More precisely, we used FixCache as BugCache is only the theoretical model behind the method. Nevertheless, BugCache is the often-used term for both methods.
Note that the observed number of models (162) that pick random features is significantly different from the expected number of models (1,425) according to a χ ²-test (p < 0.001).
You can find a complete set of the figures in the technical report Ekanayake et al. (2011) online. http://www.ifi.uzh.ch/research/publications/technical-reports.html.

References

Ancona D, Chong CL (1996) Entrainment: pace, cycle, and rhythm in organizational behavior. In: Research in organizational behavior, vol 18. JAI Press, Greenwich, pp 251–284
Google Scholar
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 2008 conference of the Center for Advanced Studies on Collaborative Research (CASCON). ACM, New York, pp 304–318
Chapter Google Scholar
Bachmann A, Bernstein A (2009) Data retrieval, processing and linking for software process data analysis. Tech. Rep. IFI-2009.0003, University of Zurich, Department of Informatics
Bernstein A, Ekanayake J, Pinzger M (2007) Improving defect prediction using temporal features and non linear models. In: IWPSE ’07: ninth international workshop on principles of software evolution, ACM, New York, pp 11–18. doi:10.1145/1294948.1294953
Google Scholar
Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE). ACM, New York, pp 121–130
Chapter Google Scholar
Brooks FP, Phillips F (1995) The mythical man-month: essays on software engineering. Addison-Wesley, Reading
Google Scholar
Diehl S, Gall HC, Hassan AE (2009) Guest editors introduction: special issue on mining software repositories. Empir Software Eng 14(3):257–261
Article Google Scholar
Eaddy M, Zimmermann T, Sherwood KD, Garg V, Murphy GC, Nagappan N, Aho AV (2008) Do crosscutting concerns cause defects? IEEE Trans Softw Eng 34(4):497–515
Article Google Scholar
Ekanayake J, Tappolet J, Gall HC, Bernstein A (2011) Time variance and defect prediction in software projects—additional figures. Tech. Rep. IFI-2011.0004, University of Zurich, Department of Informatics
Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689. doi:10.1109/32.815326
Article Google Scholar
Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661. doi:10.1109/32.859533
Article Google Scholar
Hassan AE (2009) Predicting faults using the complexity of code changes. In: ICSE ’09: Proceedings of the 31st international conference on software engineering. IEEE Computer Society, Washington, DC, pp 78–88. doi:10.1109/ICSE.2009.5070510
Google Scholar
Hassan AE, Holt RC (2005) The top ten list: dynamic fault prediction. In: ICSM ’05: Proceedings of the 21st IEEE international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 263–272. doi:10.1109/ICSM.2005.91
Chapter Google Scholar
Kagdi H, Collard ML, Maletic JI (2007) A survey and taxonomy of approaches for mining software repositories in the context of software evolution. J Softw Maint Evol 19(2):77–131. doi:10.1002/smr.344
Article Google Scholar
Kenmei B, Antoniol G, Di Penta M (2008) Trend analysis and issue prediction in large-scale open source systems. In: Proc 12th European conference on software maintenance and reengineering CSMR 2008. IEEE Computer Society, Los Alamitos, pp 73–82. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4493302
Chapter Google Scholar
Khoshgoftaar TM, Allen EB, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system. In: ISSRE ’96: Proceedings of the the seventh international symposium on software reliability engineering. IEEE Computer Society, Washington, DC, p 364
Chapter Google Scholar
Kim S, Zimmermann T, Whitehead Jr EJ, Zeller A (2007) Predicting faults from cached history. In: ICSE ’07: Proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, DC, pp 489–498. doi:10.1109/ICSE.2007.66
Google Scholar
Knab P, Pinzger M, Bernstein A (2006) Predicting defect densities in source code files with decision tree learners. In: MSR ’06: Proceedings of the 2006 international workshop on mining software repositories. ACM, New York, pp 119–125. doi:10.1145/1137983.1138012
Chapter Google Scholar
Ko AJ, Chilana PK (2010) How power users help and hinder open bug reporting. In: CHI ’10: Proceedings of the 28th international conference on human factors in computing systems. ACM, Atlanta, pp 1665–1674
Google Scholar
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496. doi:10.1109/TSE.2008.35
Article Google Scholar
Li PL, Herbsleb J, Shaw M (2005) Forecasting field defect rates using a combined time-based and metrics-based approach: a case study of openbsd. In: ISSRE ’05: Proceedings of the 16th IEEE international symposium on software reliability engineering. IEEE Computer Society, Washington, DC, pp 193–202. doi:10.1109/ISSRE.2005.19
Google Scholar
Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: ICSM ’00: Proceedings of the international conference on software maintenance (ICSM’00). IEEE Computer Society, Washington, DC, p 120
Google Scholar
Nagappan N, Ball T (2005) Static analysis tools as early indicators of pre-release defect density. In: ICSE ’05: Proceedings of the 27th international conference on software engineering. ACM, New York, NY, pp 580–586. doi:10.1145/1062455.1062558
Chapter Google Scholar
Ostrand T, Weyuker E, Bell R (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355
Article Google Scholar
Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231
Article MATH Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Google Scholar
Tsymbal A (2004) The problem of concept drift: definitions and related work. Tech. rep., Department of Computer Science Trinity College
Vorburger P, Bernstein A (2006) Entropy-based concept shift detection. In: ICDM ’06: Proceedings of the sixth international conference on data mining. IEEE Computer Society, Washington, DC, pp 1113–1118. doi:10.1109/ICDM.2006.66
Chapter Google Scholar
Widmer G, Kubat M (1993) Effective learning in dynamic environments by explicit context tracking. In: ECML ’93: Proceedings of the European conference on machine learning. Springer, London, pp 227–243
Chapter Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Mateo
MATH Google Scholar
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: PROMISE ’07: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, Washington, DC, p 9. doi:10.1109/PROMISE.2007.10
Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their many detailed comments that helped us to significantly improve the paper.

Author information

Authors and Affiliations

Dynamic and Distributed Information Systems, Institute of Informatics, University of Zurich, Zurich, Switzerland
Jayalath Ekanayake, Jonas Tappolet & Abraham Bernstein
Software Evolution and Architecture Lab, Institute of Informatics, University of Zurich, Zurich, Switzerland
Harald C. Gall

Authors

Jayalath Ekanayake
View author publications
You can also search for this author in PubMed Google Scholar
Jonas Tappolet
View author publications
You can also search for this author in PubMed Google Scholar
Harald C. Gall
View author publications
You can also search for this author in PubMed Google Scholar
Abraham Bernstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jayalath Ekanayake.

Additional information

Editors: Jim Whitehead and Michael Godfrey

J. Ekanayake was partially supported by funding from the Sabaragamuwa University of Sri Lanka while J. Tappolet was partially supported by funding from the Swiss National Science Foundation award number 200021–112330.

Appendices

Appendix A: Component List

Tables 13, 14 and 15 list all the investigated components and the number of files that each component consists.

Table 13 Eclipse: Investigated components and number of files

Full size table

Table 14 Netbeans: Investigated components and number of files

Full size table

Table 15 Mozilla: Investigated components and number of files

Full size table

Appendix B: Detailed Feature Description

This section describes the features used in the paper as well as explains their computation and rationale.

revision We consider a revision as a change made to a file for some reason. The feature revision represents the number of changes made to a file during training periods. Both Graves et al. (2000) and Khoshgoftaar et al. (1996) found that past changes are good defect indicators.

activityRate This feature measures how often a file has been revised during the training periods and is computed by dividing the number of revisions during the training period by the length of the training period (in months). Hassan and Holt (2005) concluded that a high frequency of changes in a file is a good defect predictor.

lineAdded , lineDeleted and totalLineOperations Several studies showed that past changes are good defect predictors (Graves et al. 2000; Khoshgoftaar et al. 1996). Therefore, we further quantify the amount of change done by authors using the features lineAdded and lineDeleted that describe the number of lines of code added and deleted during training periods. Further, we introduce the total amount of work done for a revision by adding those two features resulting the feature totalLineOperations.

grownPerMonth This feature provides information about the growth rate of a project or file in the training periods. We compute the amount of grown using the total number of line added and deleted during that time period. Usually, we subtract the total number of line deleted from the total number of line added and then average this value by dividing this number by the length of the training period (in months). Therefore, this number can be ether positive (representing growth) or negative (representing shrinkage). We introduced this feature to address issues that may arise due to too fast change.

lineOperationRRevision This feature captures the average size of a revision in terms of number of lines of code added and deleted. We simply add the total numbers of lines of code added and deleted during training periods and divide that amount by the number of revisions during that period.

chanceRevision and chanceBug These two features provide the probability of having a revision or a bug in a file in the future. These features mimic the award winning BugCache approach (Kim et al. 2007), which proposes that more recently fixed files are more vulnerable for bugs. We model this probability using the formula 1/2ⁱ, where i represents how far back (in months) the latest revision or bug occurred from the prediction time period. If the latest revision or bug occurrence is far from the prediction time period, then i is large and the overall probability of having a bug (or revision) in the near future is low.

blockerFixes , criticalFixes , majorFixes , minorFixes , normalFixes and trivialFixes These six features report the number of different types of bugs fixed during training periods. The bugs are categorized according to their severity such as blocker, critical, major, minor, normal and trivial. We can find the severity information of fixed bugs from bugzilla database. If a revision has a referenced or linked entry in the bugzilla database and the severity of that entry is marked as one of the above categories then we consider that the revision is for a bug fixing activity. Further, the bug-fixing revision date falls into the training periods then we count as one bug has been fixed in the assigned category. Our intention of introducing these features is to uncover any correlation between the severity and defects.

enhancementFixes This feature counts the number of revisions made for enhancements requested during the training period of the models. In the bug categorization process, authors find that some requests are not for fixing bugs, but for enhancements. Hence, we introduce the feature enhancementFixes that counts such fixed enhancements.

blockerReported , criticalReported , majorReported , minorReported , normalReported and trivialReported These six features provide information about the number of reported bugs in terms of severity. We introduce these features as not all reported bugs during a training period may be fixed within that period. Note that we consider the opening date and the reported dates are same. If an opening date falls into the training period then we count as one bug has been reported in the assigned category.

enhancementReported This feature counts number of enhancements reported during training periods. The reported is determined as above.

p1-fixes , p2-fixes , p3-fixes , p4-fixes and p5-fixes Each Bug report is further categorized based on its priority such that the highest and the lowest priority bugs are categorized as P1 and P5 respectively. The other priorities are fallen in between P1 and P5. Theses five features describe the number of priority wise bugs fixed during training periods. Bug fixing dates are determined as in the above cases. If a bug-fixing date falls into the training periods then we count as one bug has been fixed in the assigned category.

p1-reported , p2-reported , p3-reported , p4-reported and p5-reported These five feature provide information about the number of bugs reported with corresponding priority during training periods. The reported dates are determined as in the above.

lineAddedI , lineDeletedI and totalLineOperationsI Theses three features provide information about lines of code added, deleted, and total lines of code operated (or changed) to fix bugs during training periods. If a revision has a referenced entry or link in the bugzilla database and the corresponding bug report is not marked as an enhancement but has a severity levels then we consider that revision to be a big fixing activity. Furthermore, the information in the CVS log allows us to extract how many lines of code where added and deleted for that revision supplying the basis for lineAddedI and lineDeletedI. Adding these two features results in totalLineOpertaionsI. These three variables capture how much work (in terms of number of lines of code) is accomlished by the authors to fix bugs.

lineOperationIRBugFixes This feature measures the average number of lines of code changed to fix bugs during the training periods. Thus, this features captures the size of the bugs fixed and provides any correlation between the average size of fixed bugs and the defects. We can derive the feature lineOperationIRBugFixes by dividing the total number of lines changed to fix bugs by the total number of bugs fixed.

lineOperationIRTotalLines This feature describes the work effort by the authors to fix bugs relative to the other work during the training periods. We already computed the total number of lines changed (or operated) to fix bugs and other activities such as enhancements. Hence, we can derive this feature by dividing the total number of lines to fix bugs by the total number of lines changed for any other activity.

lifeTimeBlocker , lifeTimeCritical , lifeTimeMajor , lifeTimeMinor , lifeTimeNormal and lifeTimeTrivial These six feature describe about the lifetime of different types of bugs fixed during training periods. Both Bugzilla and CVS databases provide the information about opening and closing dates of the bugs. Further, Bugzilla provides the severity level of a bugs. Consequently, we can compute the lifetimes of any type of bug by taking the difference between the closing and the opening dates. Note that even when the opening dates lie outside the considered training periods we use them to compute the bug lifetimes.

hasBug This is the target variable of some of our models. This variable describes whether any kind of bug (blocking, critical, major, minor, normal, or trivial) has been reported or not in target periods.

Appendix C: Dataset Format

This section describe the format of datasets used in the first experiment.

A dataset contains two parts, labeling and feature computation. The length of the labellings period is usually one month and in this period, we record the number of bugs reported—target variable—for each observed file. The length of the feature computation period—training period—can be extended from one month to the maximum length of the observed period and further, this period starts one month before the labeling period and expands into past. In this period, we compute features listed in Table 2 for each file, which we recorded the number of bugs reported during the labeling period. The description of each feature can be found in the above section. Following is the mathematical notion of the dataset:

Assume that the observed period is d months. Y _T = { y _T,1, y _T,2, ..., y _T,j, ...., y _T,s} is a vector of dimension s ( s is the number of observed files ) and y _T,j is the number of bugs reported for file j at T, where 1 < T ≤ d. if X _t = {f _t,1, f _t,2, f _t,i, ....., f _t,n} is a feature vector of dimension n and f_t,i is a file feature i computed from the history information at time t, where n ∈ N and 1 < t ≤ d − 1, t < T and s > > > n, then constructed dataset is given by \(\sum_{t=x}^{T-1} X_t,Y_T\) . By changing the x and T variables we can generate different datasets.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ekanayake, J., Tappolet, J., Gall, H.C. et al. Time variance and defect prediction in software projects. Empir Software Eng 17, 348–389 (2012). https://doi.org/10.1007/s10664-011-9180-x

Download citation

Published: 03 November 2011
Issue Date: August 2012
DOI: https://doi.org/10.1007/s10664-011-9180-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time variance and defect prediction in software projects

Abstract

Access this article

Similar content being viewed by others

On the time-based conclusion stability of cross-project defect prediction models

The impact of tangled code changes on defect prediction models

The role of bug report evolution in reliable fixing estimation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Component List

Appendix B: Detailed Feature Description

Appendix C: Dataset Format

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Time variance and defect prediction in software projects

Abstract

Access this article

Similar content being viewed by others

On the time-based conclusion stability of cross-project defect prediction models

The impact of tangled code changes on defect prediction models

The role of bug report evolution in reliable fixing estimation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Component List

Appendix B: Detailed Feature Description

Appendix C: Dataset Format

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation