Robust Double Clustering: A Method Based on Alternating Concentration Steps

Farcomeni, Alessio

doi:10.1007/s00357-009-9026-z

Robust Double Clustering: A Method Based on Alternating Concentration Steps

Published: 05 April 2009

Volume 26, pages 77–101, (2009)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Alessio Farcomeni¹

259 Accesses
14 Citations
Explore all metrics

Abstract

We propose two algorithms for robust two-mode partitioning of a data matrix in the presence of outliers. First we extend the robust k-means procedure to the case of biclustering, then we slightly relax the definition of outlier and propose a more flexible and parsimonious strategy, which anyway is inherently less robust. We discuss the breakdown properties of the algorithms, and illustrate the methods with simulations and three real examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

A Systematic Review of Hidden Markov Models and Their Applications

Article 12 May 2020

References

ATKINSON, A.C., RIANI, M., and CERIOLI, A. (2004), Exploring Multivariate Data with the Forward Search, New York: Springer.
MATH Google Scholar
BENNET, C.A. (1954), “Effect of Measurement Error on Chemical Process Control”, Industrial Quality Control 11: 17–20.
Google Scholar
BITTNER, M., MELTZER, P., CHEN, Y., JIANG, Y., SEFTOR, E., HENDRIX, M.,RADMACHER, M., SIMON, R., YAKHINI, Z., BON-DOR, A., SAMPAS, N., DOUGHERTY, E., WANG, E., MAINCOLA, F., GOODEN, C., LUEDERS, J., GLATFELTER, A., POLLOCK, P., CARPTEN, J., GILLANDERS, E., LEJA, D., DIETRICH, K., BEAUDRY, C., BERENS, M., ALBERTS, D., and SONDAK, V. (2000), “Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling”, Nature 406: 536–540.
Article Google Scholar
BOCK, H.-H. (1996), “Probabilistic Models in Cluster Analysis”, Computational Statistics and Data Analysis 23:5–28.
Article MATH Google Scholar
CHO, H., DHILLON, I.S., GUAN, Y., and SRA, S. (2004), “Minimum Sum-Squared Residues Co-Clustering of Gene Expression Data”, Proceedings of the Fourth SIAM International Conference of Data Mining, 114–125.
CLIMER, S., and ZHANG, W. (2006) “Rearrangement Clustering: Pitfalls, Remedies, and Applications”, Journal of Machine Learning Research 7: 919–943.
MathSciNet Google Scholar
CUESTA-ALBERTOS, J., GORDALIZA, A., and MATRÀN, C. (1997), “Trimmed k-Means: An Attempt to Robustify Quantizers”, Annals of Statistics 25: 553–576.
Article MATH MathSciNet Google Scholar
DONOHO, D.L., and HUBER, P.J. (1983), “The Notion of Breakdown Point”, in A Festschirift for Erich L. Lehmann, eds. P. Bickel, K. Doksum, and J.L.Jr. Hodges, Belmont CA: Wadsworth, 157–184.
Google Scholar
FELLNER, W.H. (1986), “Robust Estimation of Variance Components”, Technometrics 28: 51–60.
Article MATH MathSciNet Google Scholar
FISHER,W. (1969), Clustering and Aggregation in Economics, Baltimore: Johns Hopkins.
Google Scholar
FRALEY, C., and RAFTERY, A.E. (2002), “Model Based Clustering, Discriminant Analysis, and Density Estimation”, Journal of the American Statistical Association 97: 611–631.
Article MATH MathSciNet Google Scholar
GALLEGOS, M.T., and RITTER, G. (2005) “A Robust Method for Cluster Analysis”, Annals of Statistics 33: 347–380.
Article MATH MathSciNet Google Scholar
GARCIA-ESCUDERO, L.A., and GORDALIZA, A. (1999), “Robustness Properties of k Means and Trimmed k Means”, Journal of the American Statistical Association 94: 956–969.
Article MATH MathSciNet Google Scholar
GARCIA-ESCUDERO, L.A., GORDALIZA, A., and MATRÀN, C. (2003), “Trimming Tools in Exploratory Data Analysis”, Journal of Computational and Graphical Statistics 12: 434–449.
Article MathSciNet Google Scholar
GOLDSTEIN, D., GHOSH, D., and CONLON, E. (2002), “Statistical Issues in the Clustering of Gene Expression Data”, Statistica Sinica 12: 219–241.
MATH MathSciNet Google Scholar
HAMPEL, F.R. (1971), “A General Qualitative Definition of Robustness”, Annals of Mathematical Statistics 42: 1887–1896.
Article MATH MathSciNet Google Scholar
HAMPEL, F.R., ROUSSEEUW, P.J., RONCHETTI, E., and STAHEL,W.A. (1986), Robust Statistics: The Approach Based on the Influence Function, New York: Wiley.
MATH Google Scholar
HARDIN, J., and ROCKE, D. (2004), “Outlier Detection in the Multiple Cluster Setting Using the Minimum Covariance Determinant Estimator”, Computational Statistics and Data Analysis 44: 625–638.
Article MathSciNet Google Scholar
HARTIGAN, J.A. (1972), “Direct Clustering of a Data Matrix”, Journal of the American Statistical Association 67: 123–129.
Article Google Scholar
HODGES, J.L. Jr. (1967), “Efficiency in Normal Samples and Tolerance of Extreme Values for Some Estimates of Location”, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1), Berkeley CA: Univ. California Press, pp. 163–186.
Google Scholar
HUBER, P.J. (1981), Robust Statistics, New York: Wiley.
Book MATH Google Scholar
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification 2: 193–218.
Article Google Scholar
KAUFMAN, L., and ROUSSEEUW, P.J. (1990), Finding Groups in Data, NewYork: Wiley.
Book Google Scholar
MADEIRA, S.C., and OLIVEIRA, A.L. (2004), “Biclustering Algorithms for Biological Data Analysis: A Survey”, IEEE/ACM Transactions on Computational Biology and Bioinformatics 1: 24–45.
Article Google Scholar
ROCCI, R., and VICHI,M. (2008), “Two-Mode Multi-Partitioning”, Computational Statistics and Data Analysis 52: 1984–2003.
Article MathSciNet Google Scholar
ROUSSEEUW, P.J. (1984), “Least Median of Squares Regression”, Journal of the American Statistical Association 79: 851–857.
Article MathSciNet Google Scholar
ROUSSEEUW, P.J., and VAN DRIESSEN, K. (1999), “A Fast Algorithm for the Minimum Covariance Determinant Estimator”, Technometrics 41: 212–223.
Article Google Scholar
ROUSSEEUW, P.J., and VAN DRIESSEN, K. (2006), “Computing LTS Regression for Large Data Sets”, Data mining and knowledge discovery 12: 29–45.
Article MathSciNet Google Scholar
SCHEPERS, J., CEULEMANS, E., and VAN MECHELEN, I. (2008), “Selecting among Multi-Mode Partitioning Models of Different Complexities: A Comparison of Four Model Selection Criteria”, Journal of Classification 25: 67–85.
Article Google Scholar
VAN MECHELEN, I., BOCK, H.H., and DE BOECK, P. (2004), “Two-Mode Clustering Methods: A Structured Overview”, Statistical Methods in Medical Research 13: 363–394.
Article MATH MathSciNet Google Scholar
VICHI, M. (2000), “Double k-means Clustering for Simultaneous Classification of Objects and Variables”, in Advances in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization, edd. S. Borra, R. Rocci, and M. Schader, Heidelberg: Springer, 43–52.
Google Scholar
ZEWOTIR, T., and GALPIN, J.S. (2007), “A Unified Approach on Residuals, Leverages and Outliers in the Linear Mixed Model”, Test 16: 58–75.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of Rome ”La Sapienza”, Piazzale Aldo Moro, 5, 00186, Roma, Italy
Alessio Farcomeni

Authors

Alessio Farcomeni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alessio Farcomeni.

Additional information

The author is grateful to four referees for detailed suggestions that led to an improved paper, and to Professor Vichi for support and careful reading of a first draft. Acknowledgements go also to Francesca Martella for advice.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Farcomeni, A. Robust Double Clustering: A Method Based on Alternating Concentration Steps. J Classif 26, 77–101 (2009). https://doi.org/10.1007/s00357-009-9026-z

Download citation

Published: 05 April 2009
Issue Date: April 2009
DOI: https://doi.org/10.1007/s00357-009-9026-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Double Clustering: A Method Based on Alternating Concentration Steps

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

A Systematic Review of Hidden Markov Models and Their Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust Double Clustering: A Method Based on Alternating Concentration Steps

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

A Systematic Review of Hidden Markov Models and Their Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation