On the dataset shift problem in software engineering prediction models

Turhan, Burak

doi:10.1007/s10664-011-9182-8

On the dataset shift problem in software engineering prediction models

Published: 12 October 2011

Volume 17, pages 62–74, (2012)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Burak Turhan¹

1618 Accesses
83 Citations
3 Altmetric
Explore all metrics

Abstract

A core assumption of any prediction model is that test data distribution does not differ from training data distribution. Prediction models used in software engineering are no exception. In reality, this assumption can be violated in many ways resulting in inconsistent and non-transferrable observations across different cases. The goal of this paper is to explain the phenomena of conclusion instability through the dataset shift concept from software effort and fault prediction perspective. Different types of dataset shift are explained with examples from software engineering, and techniques for addressing associated problems are discussed. While dataset shifts in the form of sample selection bias and imbalanced data are well-known in software engineering research, understanding other types is relevant for possible interpretations of the non-transferable results across different sites and studies. Software engineering community should be aware of and account for the dataset shift related issues when evaluating the validity of research outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating classifiers in SE research: the ECSER pipeline and two replication studies

Article Open access 08 November 2022

Software Regression Testing in Industrial Settings: Preliminary Findings from a Literature Review

On the need of preserving order of data when validating within-project defect classifiers

Article Open access 31 August 2020

Notes

For simplicity, it’s assumed that there are no other confounding effects of such changes than on software size.
However, this would introduce an unfair bias, since it would mean using, during model construction phase, information related to an attribute that is to be predicted (i.e. defect rate). The model is supposed to predict that attribute in the first place, and should be blind to such prior facts that exist in the test data.
Domain shift is not included in the discussion, since that is a measurement related issue that should be separately handled by the researcher/practitioner.
In practice, this warning applies to simulation studies, since test responses are typically not known in real settings.

References

Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge, MA
Google Scholar
Bakır A, Turhan B, Bener A (2010) A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain. Softw Qual J 18(1):57–80
Article Google Scholar
Bickel S, Brückner M, Scheffer T (2009) Discriminative learning under covariate shift. J Mach Learn Res 10:2137–2155
MathSciNet Google Scholar
Boehm B, Horowitz E, Madachy R, Reifer D, Clark BK, Steece B, Brown AW, Chulani S, Abts C (2000) Software cost estimation with Cocomo II. Prentice Hall, Englewood Cliffs, NJ
Google Scholar
Briand L, Wust J (2002) Empirical studies of quality models in object-oriented systems. Adv Comput 56:97–166
Article Google Scholar
Briand LC, Melo WL, Wust J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28:706–720
Article Google Scholar
Candela JQ, Sugiyama M, Schwaighofer A, Lawrence ND (eds) (2009) Dataset shift in machine learning. The MIT Press, Cambridge, MA
Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58
Article Google Scholar
Demirors O, Gencel C (2009) Conceptual association of functional size measurement methods. IEEE Softw 26(3):71–78
Article Google Scholar
Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95–130
Article Google Scholar
Guo P, Lyu MR (2000) Software quality prediction using mixture models with EM algorithm. In: Proceedings of the the first Asia-Pacific conference on quality software (APAQS’00). IEEE Computer Society, Washington, DC, USA, pp 69–78
Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD explorations, vol 11/1
Hand DJ (2006) Classifier technology and the illusion of progress. Stat Sci 21(1):1–15
Article MathSciNet Google Scholar
Huang J, Smola AJ, Gretton A, Borgwardt KM, Schšlkopf B (2006) Correcting sample selection bias by unlabeled data. Neural Information Processing Systems, pp 601–608
Jiang Y, Cukic B, Ma Y (2008a) Techniques for evaluating fault prediction models. Empir Soft Eng 13(5):561–595
Article Google Scholar
Jiang Y, Cukic B, Menzies T (2008b) Cost curve evaluation of fault prediction models. In: Proceedings of the 19th int’l symposium on software reliability engineering (ISSRE 2008), Redmond, WA, pp 197–206
Keung JW, Kitchenham BA, Jeffery DR (2008) Analogy-X: providing statistical inference to analogy-based software cost estimation. IEEE Trans Softw Eng 34(4):471–484
Article Google Scholar
Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329
Article Google Scholar
Kocaguneli E, Menzies T (2011) How to find relevant data for effort estimation? In: Proceedings of the 5th ACM/IEEE international symposium on empirical software engineering and measurement (ESEM’11)
Kocaguneli E, Gay G, Menzies T, Yang Y, Keung JW (2010) When to use data from other projects for effort estimation. In: Proceedings of the IEEE/ACM international conference on automated software engineering (ASE ’10). ACM, New York, pp 321–324
Chapter Google Scholar
Lin J, Keogh E, Lonardi S, Lankford J, Nystrom DM (2004) Visually mining and monitoring massive time series. In: Proceedings of 10th ACM SIGKDD international conference on knowledge and data mining. ACM Press, pp 460–469
Lokan C, Wright T, Hill PR, Stringer M (2001) Organizational benchmarking using the isbsg data repository. IEEE Softw 18:26–32
Article Google Scholar
Menzies T, Jalali O, Hihn J, Baker D, Lum K (2010) Stable rankings for different effort models. Autom Softw Eng 17(4):409–437
Article Google Scholar
Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE ’08). ACM, New York, pp 47–54
Chapter Google Scholar
Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391
Article Google Scholar
Premraj R, Zimmermann T (2007) Building software cost estimation models using homogenous data. In: Proceedings of the first international symposium on empirical software engineering and measurement (ESEM ’07). IEEE Computer Society, Washington, DC, USA, pp 393–400
Chapter Google Scholar
Shepperd M, Kadoda G (2001) Comparing software prediction techniques using simulation. IEEE Trans Softw Eng 27(11):1014–1022
Article Google Scholar
Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2):227–244
Article MathSciNet Google Scholar
Storkey A (2009) When training and test sets are different: characterizing learning transfer. In: Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (eds) Dataset shift in machine learning, chapter 1. The MIT Press, Cambridge, MA, pp 3–28
Google Scholar
Sugiyama M, Suzuki T, Nakajima S, Kashima H, von Bünau P, Kawanabe M (2008) Direct importance estimation for covariate shift adaptation. Ann Inst Stat Math 60(4):699–746
Article Google Scholar
Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578
Article Google Scholar
Wieczorek I, Ruhe M (2002) How valuable is company-specific data compared to multi-company data for software cost estimation? In: Proceedings of the 8th international symposium on software metrics (METRICS ’02). IEEE Computer Society, Washington, DC, USA, p 237
Chapter Google Scholar
Zhang H, Sheng S (2004) Learning weighted naive Bayes with accurate ranking. In: Proceedings of the 4th IEEE international conference on data mining, pp 567–570
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM

Download references

Acknowledgements

This research is partly funded by the Finnish Funding Agency for Technology and Innovation (TEKES) under Cloud Software Program. The author would like to thank the anonymous reviewers for their suggestions which greatly improved the paper.

Author information

Authors and Affiliations

Department of Information Processing Science, University of Oulu, POB.3000, 90014, Oulu, Finland
Burak Turhan

Authors

Burak Turhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Burak Turhan.

Additional information

Editors: Martin Shepperd and Tim Menzies

Rights and permissions

Reprints and permissions

About this article

Cite this article

Turhan, B. On the dataset shift problem in software engineering prediction models. Empir Software Eng 17, 62–74 (2012). https://doi.org/10.1007/s10664-011-9182-8

Download citation

Published: 12 October 2011
Issue Date: February 2012
DOI: https://doi.org/10.1007/s10664-011-9182-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the dataset shift problem in software engineering prediction models

Abstract

Access this article

Similar content being viewed by others

Evaluating classifiers in SE research: the ECSER pipeline and two replication studies

Software Regression Testing in Industrial Settings: Preliminary Findings from a Literature Review

On the need of preserving order of data when validating within-project defect classifiers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the dataset shift problem in software engineering prediction models

Abstract

Access this article

Similar content being viewed by others

Evaluating classifiers in SE research: the ECSER pipeline and two replication studies

Software Regression Testing in Industrial Settings: Preliminary Findings from a Literature Review

On the need of preserving order of data when validating within-project defect classifiers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation