ABSTRACT
This keynote discusses the need for more robust statistical methods. For visualizing data I suggest using Kernel density plots rather than box plots. For parametric analysis, I propose more robust measures of central location such as trimmed means, which can support reliable tests of the differences between the central location of two or more samples. In addition, I also recommend non-parametric effect sizes such as Cliff's δ and Brunner and Munzel's p-hat that avoid some of the problems with rank-based non-parametric methods.
- Andrea Arcuri and Lionel Briand (2011). A practical guide for using statistical tests to assess randomized algorithms in software engineering. ACM/IEEE International Conference on Software Engineering (ICSE). Google ScholarDigital Library
- Box, G. E. P. (1954) Some theorems on quadratic forms applied in the study of analysis of variance problems: 1: Effect of inequality of variance in the one-way model. Annals of Mathematics Statistics, 25, pp. 290--302Google ScholarCross Ref
- Edgar Brunner and Ullrich Munzel and Madan L. Puri. (2002) The multivariate nonparametric Behrens-Fisher problem. Journal of Statistical Planning and Inference, 108, pp. 37--52.Google ScholarCross Ref
- Norman Cliff (1993) Dominance Statistics: Ordinal Analyses to Answer Ordinal Questions. Psychological Bulletin, 114(3), pp. 494--509.Google ScholarCross Ref
- Mohamed El-Attar (2014) Using SMCD to reduce inconsistencies in misuse case models: A subject-based empirical evaluation. The Journal of Systems and Software, 87, pp. 104--118. Google ScholarDigital Library
- Barbara Kitchenham (1996) Software metrics Measurement for Software Process Improvement. NCC Blackwell. Google ScholarDigital Library
- Jeffrey D. Kromrey and Kristine Y. Hogarty and John M. Ferron and Constance V. Hines and Melinda R. Hess (2005) Robustness in Meta-Analysis: An Empirical Comparison of Point and Interval Estimates of Standardized Mean Differences and Cliff's Delta. Joint Statistical Meetings. Minneapolis, MN.Google Scholar
- L. Madeyski and M. Jureczko, (2014) "Which Process Metrics Can Significantly Improve Defect Prediction Models? An Empirical Study," Software Quality Journal, published online Google ScholarDigital Library
- Philip H. Ramsey (1980) Exact type 1 error rates for robustness of Student's t test with unequal variances. Journal of Educational and Behavioral Statistics. 5(4), pp. 337--346.Google ScholarCross Ref
- Andrew F. Tappenden and James Miller (2014) Automated cookie collection testing. ACM Transactions on Software Engineering and Methodology (TOSEM). 23(1) Google ScholarDigital Library
- B. L. Welch (1938) The Significance of the Difference Between Two Means when the Population Variances are Unequal, Biometrika, 350(3/4), pp. 350--362.Google ScholarCross Ref
- Rand W. Wilcox (2012) Modern Statistics for the Social and Behavioural Sciences. CRC Press.Google Scholar
- Rand R. Wilcox (2012) Introduction to Robust Estimation and Hypothesis Testing. 3rd Edition. Academic Press/Elsevier.Google Scholar
- Rand R. Wilcox and H. J. Keselman. (2003) Modern Robust Data Analysis Methods: Measures of Central Tendency. Psychological Methods, 8(3), pp. 254--274.Google ScholarCross Ref
- Rand R. Wilcox and Ventura L. Charlina and Karen L. Thompson. (1986) New Monte Carlo results on the robustness of the ANOVA F, W and F* statistics. Communications in Statistics - Simulation and Computation. 15(4), pp. 933--943.Google ScholarCross Ref
- András Vargha and Harold D. Delaney (2000) A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2) pp. 101--132.Google Scholar
- Yuen, K. K. (1974) The two sample trimmed t for unequal population variables. Biometrika, 61, pp. 165--170.Google ScholarCross Ref
- D. W. Zimmerman (2000) Statistical Significance Levels of Nonparametric Tests Biased by Heterogeneous Variances of Treatment Groups. The Journal of General Psychology, 127(4).Google ScholarCross Ref
- D. W. Zimmerman (1993) Rank transformations and the power of the Student t test and Welch t test for non-normal populations with unequal variances. Canadian Journal of Experimental Psychology, 47(3), pp. 523--539.Google ScholarCross Ref
Index Terms
- Robust statistical methods: why, what and how: keynote
Recommendations
Robust Statistical Methods for Empirical Software Engineering
There have been many changes in statistical theory in the past 30 years, including increased evidence that non-robust methods may fail to detect important results. The statistical advice available to software engineering researchers needs to be updated ...
Image registration using robust M-estimators
In this paper, a method for robust image registration based on M-estimator Correlation Coefficient (MCC) is presented. A real valued correlation mask function is computed using Huber and Tukey's robust statistics and is used as a similarity measure for ...
Hydrological frequency analysis based on robust statistical theory
CCDC'09: Proceedings of the 21st annual international conference on Chinese Control and Decision ConferenceIn hydrological frequency analysis, the basic assumption that hydrological extremes are identically distributed from a given probability distribution with unknown parameters, is somewhat challenged. It is more likely to recognize that data collected ...
Comments