Abstract
The Gaussian rank correlation equals the usual correlation coefficient computed from the normal scores of the data. Although its influence function is unbounded, it still has attractive robustness properties. In particular, its breakdown point is above 12%. Moreover, the estimator is consistent and asymptotically efficient at the normal distribution. The correlation matrix obtained from pairwise Gaussian rank correlations is always positive semidefinite, and very easy to compute, also in high dimensions. We compare the properties of the Gaussian rank correlation with the popular Kendall and Spearman correlation measures. A simulation study confirms the good efficiency and robustness properties of the Gaussian rank correlation. In the empirical application, we show how it can be used for multivariate outlier detection based on robust principal component analysis.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alqallaf, F.A., Konis, K.P., Martin, R.D., Zamar, R.H.: Scalable robust covariance and correlation estimates for data mining. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton (2002)
Alqallaf, F., Van Aelst, S., Yohai, V., Zamar, R.: Propagation of outliers in multivariate data. Ann. Stat. 37, 311–331 (2009)
Atkinson, A.C., Riani, M., Cerioli, A.: Exploring Multivariate Data with the Forward Search. Springer, Berlin (2004)
Bernholt, T., Fischer, P.: The complexity of computing the MCD-estimator. Theor. Comput. Sci. 326, 383–398 (2004)
Branco, J.A., Croux, C., Filzmoser, P., Oliveira, M.R.: Robust canonical correlations: A comparative study. Comput. Stat. 20, 203–229 (2005)
Capéraà, P., Guillem, A.I.G.: Taux de resistance des tests de rang d’indépendance. Can. J. Stat. 25, 113–124 (1997)
Christensen, D.: Fast algorithms for the calculation of Kendall’s τ. Comput. Stat. 20, 51–62 (2005)
Critchley, F., Schyns, M., Haesbroeck, G.: A relaxed approach to combinatorial problems in robustness and diagnostics. Stat. Comput. 20, 99–115 (2010)
Croux, C., Dehon, C.: Influence functions of the Spearman and Kendall correlation measures. Stat. Methods Appl. 19, 497–515 (2010)
Croux, C., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87, 603–618 (2000)
Daudin, J.J., Duby, C., Trecourt, P.: Stability of principal component analysis studied by the bootstrap method. Statistics 19, 241–258 (1988)
Davies, P., Gather, U.: Breakdown and groups (with discussion). Ann. Stat. 33, 977–1035 (2005)
Devlin, S., Gnanadesikan, R., Kettering, J.: Robust estimation and outlier detection with correlation coefficients. Biometrika 62, 531–545 (1975)
Dominici, D.E.: The inverse of the cumulative standard normal probability function. Integral Transforms Spec. Funct. 14, 281–292 (2003)
Filzmoser, P., Fritz, H., Kalcher, K.: pcaPP: Robust PCA by Projection Pursuit. R package version 1.9 (2010)
Grize, Y.: Robustheitseigenschaften von Korrelations-schätzungen. Ph. D. thesis, ETH Zürich (1978)
Hájek, J., Sidak, Z.: Theory of Rank Tests. Academic Press, New York (1967)
Hubert, M., Rousseeuw, P., Vanden Branden, K.: ROBPCA: a new approach to robust principal component analysis. Technometrics 47, 64–79 (2005)
Iman, R., Conover, W.: A distribution-free approach to inducing rank correlation among input variables. Commun. Stat., Simul. Comput. 11, 311–334 (1982)
Kendall, M.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)
Khan, J., Van Aelst, S., Zamar, R.: Robust linear model selection based on least angle regression. J. Am. Stat. Assoc. 480, 1289–1299 (2007)
Maronna, R., Zamar, R.: Robust estimates of location and dispersion of high-dimensional datasets. Technometrics 44, 307–317 (2002)
Maronna, R.A., Martin, R.D., Yohai, V.J.: Robust Statistics: Theory and Methods. Wiley, Chichester (2006)
Rousseeuw, P., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke, T., Maechler, M.: Robustbase: Basic Robust Statistics. R package version 0.5-0-1 (2009)
Spearman, C.: General intelligence objectively determined and measured. Am. J. Psychol. 15, 201–293 (1904)
Van Aelst, S., Vandervieren, E., Willems, G.: Robust principal component analysis based on pairwise correlation estimators. In: Proceedings of the 19th International Conference on Computational Statistics, Paris, pp. 573–580 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Boudt, K., Cornelissen, J. & Croux, C. The Gaussian rank correlation estimator: robustness properties. Stat Comput 22, 471–483 (2012). https://doi.org/10.1007/s11222-011-9237-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-011-9237-0