An investigation on the feasibility of cross-project defect prediction

He, Zhimin; Shu, Fengdi; Yang, Ye; Li, Mingshu; Wang, Qing

doi:10.1007/s10515-011-0090-3

An investigation on the feasibility of cross-project defect prediction

Published: 13 July 2011

Volume 19, pages 167–199, (2012)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Zhimin He^1,2,
Fengdi Shu¹,
Ye Yang¹,
Mingshu Li^1,3 &
…
Qing Wang¹

1873 Accesses
242 Citations
1 Altmetric
Explore all metrics

Abstract

Software defect prediction helps to optimize testing resources allocation by identifying defect-prone modules prior to testing. Most existing models build their prediction capability based on a set of historical data, presumably from the same or similar project settings as those under prediction. However, such historical data is not always available in practice. One potential way of predicting defects in projects without historical data is to learn predictors from data of other projects. This paper investigates defect predictions in the cross-project context focusing on the selection of training data. We conduct three large-scale experiments on 34 data sets obtained from 10 open source projects. Major conclusions from our experiments include: (1) in the best cases, training data from other projects can provide better prediction results than training data from the same project; (2) the prediction results obtained using training data from other projects meet our criteria for acceptance on the average level, defects in 18 out of 34 cases were predicted at a Recall greater than 70% and a Precision greater than 50%; (3) results of cross-project defect predictions are related with the distributional characteristics of data sets which are valuable for training data selection. We further propose an approach to automatically select suitable training data for projects without historical data. Prediction results provided by the training data selected by using our approach are comparable with those provided by training data from the same project.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Boetticher, G., Menzies, T., Ostrand, T.J.: PROMISE repository of empirical software engineering data. http://promisedata.org/repository (2007). Accessed 12 December 2010
Carvalho, A.B., Pozo, A., Vergilio, S.R.: A symbolic fault-prediction model based on multiobjective particle swarm optimization. J. Syst. Softw. 83(5), 7346–7354 (2010)
Google Scholar
Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)
Article Google Scholar
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)
Article Google Scholar
D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories, pp. 31–41 (2010)
Chapter Google Scholar
Fenton, N., Ohlsson, N.: Quantitative analysis of faults and failures in a complex software system. IEEE Trans. Softw. Eng. 26(8), 797–814 (2000)
Article Google Scholar
Hassan, A.E., Holt, R.C.: The top ten list: dynamic fault prediction. In: Proceedings of the 21st IEEE International Conference on Software Maintenance, pp. 263–272 (2005)
Chapter Google Scholar
Hulse, J.V., Khoshgoftaar, T.: Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 68(12), 1513–1542 (2009)
Article Google Scholar
Jiang, Y., Cukic, B., Ma, Y.: Techniques for evaluating fault prediction models. Empir. Softw. Eng. 13(15), 561–595 (2008)
Article Google Scholar
Jureczko, M., Madeyski, L.: Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, pp. 1–10 (2010)
Chapter Google Scholar
Jureczko, M., Spinellis, D.: Using object-oriented design metrics to predict software defects. In: Proceedings of the 5th International Conference on Dependability of Computer Systems, pp. 69–81 (2010)
Google Scholar
Khoshgoftaar, T.M., Seliya, N., Drown, D.J.: Evolutionary data analysis for the class imbalance problem. Intell. Data Anal. 14(1), 69–88 (2010)
Google Scholar
Khoshgoftaar, T.M., Zhong, S., Joshi, V.: Enhancing software quality estimation using ensemble-classifier based noise filtering. Intell. Data Anal. 9(1), 3–27 (2005)
Google Scholar
Kocaquneli, E., Gay, G., Menzies, T., Yang, Y., Keung, J.W.: When to use data from other projects for effort estimation. In: Proceedings of the 25th International Conference on Automated Software Engineering, pp. 321–324 (2010)
Google Scholar
Koru, A.G., Liu, H.: Building effective defect-prediction models in practice. IEEE Softw. 22(6), 23–29 (2005)
Article Google Scholar
Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008)
Article Google Scholar
Li, Q., Yang, Y., Li, M., Wang, Q., Boehm, B.W., Hu, C.: Improving software testing process: feature prioritization to make winners of success-critical stakeholders. J. Softw. Maint. Evol.: Res. Pract. (2010, published online)
Menzies, T., Dekhtyar, A., Distefano, J., Greenwald, J.: Problems with precision: a response to “Comments on ‘Data mining static code attributes to learn defect predictors’ ”. IEEE Trans. Softw. Eng. 33(9), 637–640 (2007a)
Article Google Scholar
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007b)
Article Google Scholar
Menzies, T., Turhan, B., Benser, A., Gay, G., Cukic, B., Jiang, Y.: Implications of ceiling effects in defect predictors. In: Proceedings of the 4th International Conference on Predictive Models in Software Engineering, pp. 47–54 (2008)
Google Scholar
Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., Bener, A.: Defect prediction from static code features: current results, limitations, new approaches. Autom. Softw. Eng. 17(4), 375–407 (2010)
Article Google Scholar
Moser, R., Pedrycz, W., Succi, G.: A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering, pp. 181–190 (2008)
Google Scholar
Nagappan, N., Ball, T.: Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th International Conference on Software Engineering, pp. 284–292 (2005)
Google Scholar
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, pp. 452–461 (2006)
Google Scholar
Ohlsson, N., Alberg, H.: Predicting fault-prone software modules in telephone switches. IEEE Trans. Softw. Eng. 22(12), 886–894 (1996)
Article Google Scholar
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 31(4), 340–355 (2005)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Shepped, M., Ince, D.: A critique of three metrics. J. Syst. Softw. 26(3), 197–210 (1994)
Article Google Scholar
Tosun, A., Turhan, B., Bener, A.: Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry. In: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, pp. 1–9 (2009)
Chapter Google Scholar
Tosun, A., Bener, A., Kale, R.: AI-based software defect predictors: applications and benefits in a case study. In: Proceedings of the 22th Innovative Applications of Artificial Intelligence Conference, pp. 1748–1755 (2010)
Google Scholar
Turhan, B., Menzies, T., Bener, A.: On the relative value of cross-company and within_company data for defect prediction. Empir. Softw. Eng. 14(5), 540–578 (2009)
Article Google Scholar
Turhan, B., Bener, A., Menzies, T.: Regularities in learning defect predictors. In: The 11th International Conference on Product Focused Software Development and Process Improvement, pp. 116–130 (2010)
Chapter Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley-Interscience, New York (1998)
MATH Google Scholar
Wahyudin, D., Ramler, D., Biffl, S.: A framework for defect prediction in specific software project contexts. In: The 3rd IFIP Central and East European Conference on Software Engineering Techniques (2008)
Google Scholar
Watanabe, S., Kaiya, H., Kaijiri, K.: Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the International Workshop on Predictive Models in Software Engineering, pp. 19–24 (2008)
Chapter Google Scholar
Weyuker, E.J., Ostrand, T.J.: What can fault prediction do for you? Lect. Notes Comput. Sci. 4966, 18–29 (2008)
Article Google Scholar
Weyuker, E.J., Ostrand, T.J., Bell, R.M.: Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empir. Softw. Eng. 13(5), 539–559 (2008)
Article Google Scholar
Weyuker, E.J., Ostrand, T.J., Bell, R.M.: Comparing the effectiveness of several modeling methods for fault prediction. Empir. Softw. Eng. 15(3), 277–295 (2009)
Article Google Scholar
Zhang, H., Zhang, X.: Comments on “Data mining static code attributes to learn defect predictors”. IEEE Trans. Softw. Eng. 33(9), 635–637 (2007)
Article MATH Google Scholar
Zimmermann, T., Nagappan, N., Gall, H.: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pp. 91–100 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Internet Software Technologies, Institute of Software Chinese Academy of Sciences, Beijing, 100190, China
Zhimin He, Fengdi Shu, Ye Yang, Mingshu Li & Qing Wang
Graduate University Chinese Academy of Sciences, Beijing, 100190, China
Zhimin He
State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, 100190, China
Mingshu Li

Authors

Zhimin He
View author publications
You can also search for this author in PubMed Google Scholar
Fengdi Shu
View author publications
You can also search for this author in PubMed Google Scholar
Ye Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mingshu Li
View author publications
You can also search for this author in PubMed Google Scholar
Qing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhimin He.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, Z., Shu, F., Yang, Y. et al. An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19, 167–199 (2012). https://doi.org/10.1007/s10515-011-0090-3

Download citation

Received: 15 December 2010
Accepted: 27 June 2011
Published: 13 July 2011
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10515-011-0090-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An investigation on the feasibility of cross-project defect prediction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Studying just-in-time defect prediction using cross-project models

Cross project defect prediction: a comprehensive survey with its SWOT analysis

Unsupervised Learning to Heterogeneous Cross Software Projects Defect Prediction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An investigation on the feasibility of cross-project defect prediction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Studying just-in-time defect prediction using cross-project models

Cross project defect prediction: a comprehensive survey with its SWOT analysis

Unsupervised Learning to Heterogeneous Cross Software Projects Defect Prediction

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation