Global vs. local models for cross-project defect prediction

Herbold, Steffen; Trautsch, Alexander; Grabowski, Jens

doi:10.1007/s10664-016-9468-y

Global vs. local models for cross-project defect prediction

A replication study

Published: 24 October 2016

Volume 22, pages 1866–1902, (2017)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

1698 Accesses
47 Citations
6 Altmetric
Explore all metrics

Abstract

Although researchers invested significant effort, the performance of defect prediction in a cross-project setting, i.e., with data that does not come from the same project, is still unsatisfactory. A recent proposal for the improvement of defect prediction is using local models. With local models, the available data is first clustered into homogeneous regions and afterwards separate classifiers are trained for each homogeneous region. Since the main problem of cross-project defect prediction is data heterogeneity, the idea of local models is promising. Therefore, we perform a conceptual replication of the previous studies on local models with a focus on cross-project defect prediction. In a large case study, we evaluate the performance of local models and investigate their advantages and drawbacks for cross-project predictions. To this aim, we also compare the performance with a global model and a transfer learning technique designed for cross-project defect predictions. Our findings show that local models make only a minor difference in comparison to global models and transfer learning for cross-project defect prediction. While these results are negative, they provide valuable knowledge about the limitations of local models and increase the validity of previously gained research results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Machine learning techniques for credit risk evaluation: a systematic literature review

Article 01 April 2020

Siddharth Bhatore, Lalit Mohan & Y. Raghu Reddy

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Steven Euijong Whang, Yuji Roh, … Jae-Gil Lee

A method for identifying different types of university research teams

Article Open access 18 April 2024

Zhe Cheng, Yihuan Zou & Yueyang Zheng

Notes

With the data used in the study and the success criterion of having both recall and precision of at least 0.75 they achieved a success rate of about 3 %.
The studies by Menzies et al and Bettenburg et al were first published in an initial version at a conference and then in greater detail in a journal publication, leading to five publications for the three studies.
recall, precision, and accuracy all at least 0.75.
The tera-PROMISE repository is the successor of the PROMISE repository, which was previously located at http://promisedata.googlecode.com.
http://bug.inf.usi.ch/
instead of recall, sometimes PD or tpr are used in the literature. PD stands for probability of defect and tpr for true positive rate.
This problem is still very relevant. For example, during the 37th International Conference on Software Engineering held in May 2015, there were five papers on defect prediction (Caglayan et al. 2015; Ghotra et al. 2015; Peters et al. 2015; Tan et al. 2015; Tantithamthavorn et al. 2015). None of them used exactly the same performance measures.
Github: https://github.com/sherbold/replication-kit-emse-2016-local-models/tree/master/replication-kit Zipped Archive: http://hdl.handle.net/21.11101/0000-0001-3C55-D

References

Amasaki S, Kawata K, Yokogawa T (2015) Improving cross-project defect prediction methods with data simplification. In: 41st Euromicro conference on software engineering and advanced applications (SEAA)
Bettenburg N, Nagappan M, Hassan A (2012) Think locally, act globally: improving defect and effort prediction models. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR). IEEE Computer Society
Bettenburg N, Nagappan M, Hassan A (2014) Towards improving statistical modeling of software engineering data: think locally, act globally!. Empir Softw Eng:1–42
Caglayan B, Turhan B, Bener A, Habayeb M, Miranskyy A, Cialini E (2015) Merits of organizational metrics in defect prediction: an industrial replication. In: Proceedings of the 37th international conference on software engineering (ICSE)
Camargo Cruz AE, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement (ESEM). IEEE Computer Society
Carver JC (2010) Towards reporting guidelines for experimental replications: a proposal. In: Proceedings of the international workshop on replication in empirical software engineering
Chidamber S, Kemerer C (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Article Google Scholar
D’Ambros M, Lanza M, Robbes R (2010) An Extensive Comparison of Bug Prediction Approaches. In: Proceedings of the 7th IEEE working conference on mining software repositories (MSR). IEEE Computer Society
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J of the Royal Statistical Society Series B (Methodological) 39(1):1–38
MathSciNet MATH Google Scholar
Drummond C, Holte RC (2003) C4.5, class imbalance and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II
Faloutsos C, Lin KI (1995) Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. SIGMOD Rec 24(2):163–174
Article Google Scholar
Fraley C, Raftery AE (1999) MCLUST: software for model-based cluster analysis. J Classif 16(2):297–306
Article MATH Google Scholar
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering (ICSE)
Gray D, Bowes D, Davey N, Sun Y, Christianson B (2011) The misuse of the NASA metrics data program data sets for automated software defect prediction. In: Proceedings of the 15th annual conference on evaluation & assessment in software engineering (EASE). IET
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18
Article Google Scholar
Halstead MH (1977) Elements of software science (operating and programming systems series). Elsevier Science Inc.
Han J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann
Hassan A (2009) Predicting faults using the complexity of code changes. In: IEEE 31st International Conference on Software Engineering, 2009. ICSE 2009, pp 78–88. doi:10.1109/ICSE.2009.5070510
He P, Li B, Ma Y (2014) Towards cross-project defect prediction with imbalanced feature sets. CoRR arXiv:1411.4228
He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19:167–199
Article Google Scholar
He Z, Peters F, Menzies T, Yang Y (2013) Learning from Open-Source projects: an empirical study on defect prediction. In: Proceedings of the 7th international symposium on empirical software engineering and measurement (ESEM)
Henderson-Sellers B (1996) Object-oriented metrics; measures of complexity. Prentice-Hall
Herbold S (2013) Training data selection for cross-project defect prediction. In: Proceedings of the 9th international conference on predictive models in software engineering (PROMISE), ACM
Herbold S (2015) Crosspare: a tool for benchmarking cross-project defect predictions. In: Proceedings of the 4th international workshop on software mining (SoftMine)
Huang L, Port D, Wang L, Xie T, Menzies T (2010) Text mining in supporting software systems risk assurance. In: Proceedings of the 25th IEEE/ACM international conference on automated software engineering(ASE), ACM
Jelihovschi E, Faria J, Allaman I (2014) Scottknott: a package for performing the Scott-Knott clustering algorithm in R. TEMA (São Carlos) 15:3–17
Article MathSciNet Google Scholar
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595
Article Google Scholar
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering (PROMISE), ACM
Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: 3rd international conference on applied computing and information technology/2nd international conference on computational science and intelligence (ACIT-CSI)
Kitchenham B (2008) The role of replications in empirical software engineering word of warning. Empir Softw Eng 13(2):219–221
Article Google Scholar
Kocaguneli E, Menzies T, Keung J, Cok D, Madachy R (2013) Active learning and effort estimation: Finding the essential content of software effort estimation data. IEEE Trans Softw Eng 39(8):1040–1053. doi:10.1109/TSE.2012.88
Article Google Scholar
Kotsiantis S, Kanellopoulos D, Pintelas P (2006) Data preprocessing for supervised leaning. Int J Comp Sci 1(2):111–117
Google Scholar
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
Article Google Scholar
Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? an empirical study. Softw Qual J 23(3):393–422. doi:10.1007/s11219-014-9241-7
Article Google Scholar
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60
Article MathSciNet MATH Google Scholar
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320
Article MathSciNet MATH Google Scholar
Meneely A, Williams L, Snipes W, Osborne J (2008) Predicting failures with developer networks and social network analysis. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, ACM, New York, NY, USA, SIGSOFT ’08/FSE-16, pp 13–23. doi:10.1145/1453101.1453106
Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE), ACM
Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local vs. global models for effort estimation and defect prediction. In: Proceedings of the 26th IEEE/ACM international conference on automated software engineering (ASE), IEEE Computer Society
Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39(6):822–834
Article Google Scholar
Menzies T, Pape C, Steele C (2014) tera-promise. http://openscience.us/repo/
Nam J, Kim S (2015) Heterogeneous defect prediction. In: Proceedings of the 10th joint meeting of the european software engineering conference (ESEC) and the ACM SIGSOFT symposium on the foundations of software engineering (FSE). doi:10.1145/2786805.2786814
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proceedings of the 35th international conference on software engineering (ICSE)
Ngomo ACN (2009) Low-bias extraction of domain-specific concepts. Ph.D. Thesis
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Peters F, Menzies T, Gong L, Zhang H (2013) Balancing privacy and utility in cross-company defect prediction. IEEE Trans Softw Eng 39(8):1054–1068
Article Google Scholar
Peters F, Menzies T, Layman L (2015) LACE2: better privacy-preserving data sharing for cross project defect prediction. In: Proceedings of the 37th international conference on software engineering (ICSE)
Premraj R, Herzig K (2011) Network versus code metrics to predict defects: a replication study. In: Proceedings of the international symposium on empirical software engineering and measurement (ESEM)
Rahman F, Posnett D, Devanbu P (2012) Recalling the “imprecision” of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering (FSE). ACM
Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131–164
Article Google Scholar
Scanniello G, Gravino C, Marcus A, Menzies T (2013) Class level fault prediction using software clustering. In: Proceedings of the 28th IEEE/ACM international conference on automated software engineering (ASE). IEEE Computer Society
Schikuta E, Schikuta E (1993) Grid-clustering: a hierarchical clustering method for very large data sets. In: Proceedings of the 15th international conference on pattern recognition
Schölkopf B, Smola AJ (2002) Learning with Kernels. MIT Press
Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30(3):507–512
Article MATH Google Scholar
Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215
Article Google Scholar
Shull F, Carver J, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13(2):211–218
Article Google Scholar
Siegmund J, Siegmund N, Apel S (2015) Views on internal and external validity in empirical software engineering. In: 37th International conference on software engineering
Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the 37th international conference on software engineering (ICSE)
Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto Ki (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: Proceedings of the 37th international conference on software engineering (ICSE)
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th international conference on software engineering. doi:10.1145/2884781.2884857. ACM
Turhan B, Menzies T, Bener A, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14:540–578
Article Google Scholar
van Gestel T, Suykens J, Baesens B, Viaene S, Vanthienen J, Dedene G, de Moor B, Vandewalle J (2004) Benchmarking least squares support vector machine classifiers. Mach Learn 54(1):5–32
Article MATH Google Scholar
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE). ACM
Xu R, Wunsch ID (2005) Survey of clustering algorithms. IEEE Trans on Neural Networks 16(3):645–678
Article Google Scholar
Zhang F, Mockus A, Keivanloo I, Zou Y (2014) Towards building a universal defect prediction model. In: Proceedings of the 11th working conference on mining software repositories (MSR). ACM
Zhang F, Mockus A, Keivanloo I, Zou Y (2015) Towards building a universal defect prediction model with rank transformed predictors. Empir Softw Eng:1–39. doi:10.1007/s10664-015-9396-2
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the the 7th joint meeting european software engineering conference (ESEC) and the ACM SIGSOFT symposium on the foundations of software engineering (FSE). ACM, pp 91–100

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Georg-August-Universität Göttingen, Göttingen, Germany
Steffen Herbold, Alexander Trautsch & Jens Grabowski

Authors

Steffen Herbold
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Trautsch
View author publications
You can also search for this author in PubMed Google Scholar
Jens Grabowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steffen Herbold.

Additional information

Communicated by: Burak Turhan

Appendix A: Metrics

1.1 A.1 JSTAT Data

The following metrics are part of the JSTAT data:

WMC: weighted method count, number of methods in a class
DIT: depth of inheritance tree
NOC: number of children
CBO: coupling between objects, number of classes coupled to a class
RFC: response for class, number of different methods that can be executed if the class receives a message
LCOM: lack of cohesion in methods, number of methods not related through the sharing of some of the class fields
LCOM3: lack of cohesion in methods after Henderson-Sellers (1996)
NPM: number of public methods
DAM: data access metric, ratio of private (protected) attributes to total number of attributes in the class
MOA: measure of aggregation, number of class fields whose types are user defined classes
MFA: measure of functional abstraction, ratio of the number of methods inherited by a class to the total number of methods accessible by the member methods of the class
CAM: cohesion among methods of class, relatedness of methods based upon the parameter list of the methods
IC: inheritance coupling, number of parent classes to which the class is coupled
CBM: coupling between methods, number of new/redefined methods to which all the inherited methods are coupled
AMC: average method complexity
Ca: afferent couplings
Ce: efferent couplings
CC: cyclomatic complexity
Max(CC): maximum cyclomatic complexity among methods
Avg(CC): average cyclomatic complexity among methods

For a detailed explanation see Jureczko and Madeyski (2010).

1.2 A.2 MDP Data

The following metrics are part of the MDP data. This is the common subset of metrics that is obtained by all projects within the MDP data set:

LOC_TOTAL: total lines of code
LOC_EXECUTABLE: exectuable lines of code
LOC_COMMENTS: lines of comments
LOC_CODE_AND_COMMENT: lines with comments or code
NUM_UNIQUE_OPERATORS: number of unique operators
NUM_UNIQUE_OPERANDS: number of unique operands
NUM_OPERATORS: total number of operators
NUM_OPERANDS: total number of operands
HALSTEAD_VOLUME: Halstead volume (see Halstead 1977)
HALSTEAD_LENGTH: Halstead length (see Halstead 1977)
HALSTEAD_DIFFICULTY: Halstead difficulty (see Halstead 1977)
HALSTEAD_EFFORT: Halstead effort (see Halstead 1977)
HALSTEAD_ERROR_EST: Halstead Error, also known as Halstead Bug (see Halstead 1977)
HALSTEAD_PROG_TIME: Halstead Pro
BRANCH_COUNT: Number of branches
CYCLOMATIC_COMPLEXITY: Cyclomatic complexity (same as CC in the JSTAT data)
DESIGN_COMPLEXITY: design complexity

1.3 A.3 JPROC Data

The following metrics are part of the JPROC data:

CBO: coupling between objects
DIT: depth of inheritance tree
fanIn: number of other classes that reference the class
fanOut: number of other classes referenced by the class
LCOM: lack of cohesion in methods
NOC: number of children
RFC: response for class
WMC: weighted method count
NOA: number of attributes
NOAI: number of attributes inherited
LOC: lines of code
NOM: number of methods
NOMI: number of methods inherited
NOPRA: number of private attributes
NOPRM: number of private methods
NOPA: number of public attributes
NOPM: number of public methods
NR: number of revisions
NREF: number of times the file has been refactored
NAUTH: number of authors
LADD: sum of lines added
max(LADD): maximum lines added
avg(LADD): average lines added
LDEL: sum of lines removed
max(LDEL): maximum lines deleted
avg(LDEL): average lines deleted
CHURN: sum of code churn
max(CHURN): maximum code churn
avg(CHURN): average code churn
AGE: age of the file
WAGE: weighted age of the file

For a detailed explanation see D’Ambros et al. (2010).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Herbold, S., Trautsch, A. & Grabowski, J. Global vs. local models for cross-project defect prediction. Empir Software Eng 22, 1866–1902 (2017). https://doi.org/10.1007/s10664-016-9468-y

Download citation

Published: 24 October 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s10664-016-9468-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Global vs. local models for cross-project defect prediction

Abstract

Access this article

Similar content being viewed by others

Machine learning techniques for credit risk evaluation: a systematic literature review

Data collection and quality challenges in deep learning: a data-centric AI perspective

A method for identifying different types of university research teams

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix A: Metrics

1.1 A.1 JSTAT Data

1.2 A.2 MDP Data

1.3 A.3 JPROC Data

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Global vs. local models for cross-project defect prediction

Abstract

Access this article

Similar content being viewed by others

Machine learning techniques for credit risk evaluation: a systematic literature review

Data collection and quality challenges in deep learning: a data-centric AI perspective

A method for identifying different types of university research teams

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix A: Metrics

Appendix A: Metrics

1.1 A.1 JSTAT Data

1.2 A.2 MDP Data

1.3 A.3 JPROC Data

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation