Abstract
The Forward Search is used in an exploratory manner, with many random starts, to indicate the number of clusters and their membership in continuous data. The prospective clusters can readily be distinguished from background noise and from other forms of outliers. A confirmatory Forward Search, involving control on the sizes of statistical tests, establishes precise cluster membership. The method performs as well as robust methods such as TCLUST. However, it does not require prior specification of the number of clusters, nor of the level of trimming of outliers. In this way it is “user friendly”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Atkinson, A. C., & Riani, M. (2007). Exploratory tools for clustering multivariate data. Computational Statistics and Data Analysis, 52, 272–285.
Atkinson, A. C., Riani, M., & Cerioli, A. (2004). Exploring multivariate data with the forward search. New York: Springer.
Atkinson, A. C., Riani, M., & Cerioli, A. (2006). Random start forward searches with envelopes for detecting clusters in multivariate data. In S. Zani, A. Cerioli, M. Riani, & M. Vichi (Eds.), Data Analysis, Classification and the Forward Search (pp. 163–171). Berlin: Springer.
Cerioli, A., & Perrotta, D. (2014). Robust clustering around regression lines with high density regions. Advances in Data Analysis and Classification, 8, 5–26.
Coretto, P., & Hennig, C. (2010). A simulation study to compare robust clustering methods based on mixtures. Advances in Data Analysis and Classification, 4, 111–135.
Fowlkes, E. B., Gnanadesikan, R., & Kettenring, J. R. (1988). Variable selection in clustering. Journal of Classification, 5, 205–228.
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611–631.
Fritz, H., García-Escudero, L. A., & Mayo-Iscar, A. (2012). TCLUST: An R package for a trimming approach to cluster analysis. Journal of Statistical Software, 47, 1–26.
Gallegos, M. T., & Ritter, G. (2009). Trimming algorithms for clustering contaminated grouped data and their robustness. Advances in Data Analysis and Classification, 3, 135–167.
García-Escudero, L. A., Gordaliza, A., Matrán, C., & Mayo-Iscar, A. (2008). A general trimming approach to robust cluster analysis. Annals of Statistics, 36, 1324–1345.
García-Escudero, L. A., Gordaliza, A., Matrán, C., & Mayo-Iscar, A. (2010). A review of robust clustering methods. Advances in Data Analysis and Classification, 4, 89–109.
García-Escudero, L. A., Gordaliza, A., Matrán, C., & Mayo-Iscar, A. (2011). Exploring the number of groups in model-based clustering. Statistics and Computing, 21, 585–599.
Hennig, C., & Christlieb, N. (2002). Validating visual clusters in large datasets: Fixed point clusters of spectral features. Computational Statistics and Data Analysis, 40, 723–739.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data. An introduction to cluster analysis. New York: Wiley.
Lee, S. X., & Mclachlan, G. J. (2013). Model-based clustering and classification with non-normal mixture distributions. Statistical Methods and Applications, 22, 427–454.
Milligan, G. W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325–342.
Morelli, G. (2013). A comparison of different classification methods. Ph.D. dissertation, Università di Parma.
Riani, M., Atkinson, A. C., & Cerioli, A. (2009). Finding an unknown number of multivariate outliers. Journal of the Royal Statistical Society, Series B, 71, 447–466.
Riani, M., Perrotta, D., & Torti, F. (2012). FSDA: A MATLAB toolbox for robust analysis and interactive data exploration. Chemometrics and Intelligent Laboratory Systems, 116, 17–32.
Riani, M., Atkinson, A. C., & Perrotta, D. (2014). A parametric framework for the comparison of methods of very robust regression. Statistical Science, 29, 128–143.
Acknowledgements
We are very grateful to Berthold Lausen and Matthias Bömher for their scientific and organizational support during the European Conference on Data Analysis 2013. We also thank an anonymous reviewer for careful reading of an earlier draft, and for pointing out the reference to Hennig and Christlieb (2002). Our work on this paper was partly supported by the project MIUR PRIN “MISURA—Multivariate Models for Risk Assessment”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Atkinson, A.C., Cerioli, A., Morelli, G., Riani, M. (2015). Finding the Number of Disparate Clusters with Background Contamination. In: Lausen, B., Krolak-Schwerdt, S., Böhmer, M. (eds) Data Science, Learning by Latent Structures, and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44983-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-662-44983-7_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44982-0
Online ISBN: 978-3-662-44983-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)