Skip to main content
Log in

Robust Double Clustering: A Method Based on Alternating Concentration Steps

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

We propose two algorithms for robust two-mode partitioning of a data matrix in the presence of outliers. First we extend the robust k-means procedure to the case of biclustering, then we slightly relax the definition of outlier and propose a more flexible and parsimonious strategy, which anyway is inherently less robust. We discuss the breakdown properties of the algorithms, and illustrate the methods with simulations and three real examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ATKINSON, A.C., RIANI, M., and CERIOLI, A. (2004), Exploring Multivariate Data with the Forward Search, New York: Springer.

    MATH  Google Scholar 

  • BENNET, C.A. (1954), “Effect of Measurement Error on Chemical Process Control”, Industrial Quality Control 11: 17–20.

    Google Scholar 

  • BITTNER, M., MELTZER, P., CHEN, Y., JIANG, Y., SEFTOR, E., HENDRIX, M.,RADMACHER, M., SIMON, R., YAKHINI, Z., BON-DOR, A., SAMPAS, N., DOUGHERTY, E., WANG, E., MAINCOLA, F., GOODEN, C., LUEDERS, J., GLATFELTER, A., POLLOCK, P., CARPTEN, J., GILLANDERS, E., LEJA, D., DIETRICH, K., BEAUDRY, C., BERENS, M., ALBERTS, D., and SONDAK, V. (2000), “Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling”, Nature 406: 536–540.

    Article  Google Scholar 

  • BOCK, H.-H. (1996), “Probabilistic Models in Cluster Analysis”, Computational Statistics and Data Analysis 23:5–28.

    Article  MATH  Google Scholar 

  • CHO, H., DHILLON, I.S., GUAN, Y., and SRA, S. (2004), “Minimum Sum-Squared Residues Co-Clustering of Gene Expression Data”, Proceedings of the Fourth SIAM International Conference of Data Mining, 114–125.

  • CLIMER, S., and ZHANG, W. (2006) “Rearrangement Clustering: Pitfalls, Remedies, and Applications”, Journal of Machine Learning Research 7: 919–943.

    MathSciNet  Google Scholar 

  • CUESTA-ALBERTOS, J., GORDALIZA, A., and MATRÀN, C. (1997), “Trimmed k-Means: An Attempt to Robustify Quantizers”, Annals of Statistics 25: 553–576.

    Article  MATH  MathSciNet  Google Scholar 

  • DONOHO, D.L., and HUBER, P.J. (1983), “The Notion of Breakdown Point”, in A Festschirift for Erich L. Lehmann, eds. P. Bickel, K. Doksum, and J.L.Jr. Hodges, Belmont CA: Wadsworth, 157–184.

    Google Scholar 

  • FELLNER, W.H. (1986), “Robust Estimation of Variance Components”, Technometrics 28: 51–60.

    Article  MATH  MathSciNet  Google Scholar 

  • FISHER,W. (1969), Clustering and Aggregation in Economics, Baltimore: Johns Hopkins.

    Google Scholar 

  • FRALEY, C., and RAFTERY, A.E. (2002), “Model Based Clustering, Discriminant Analysis, and Density Estimation”, Journal of the American Statistical Association 97: 611–631.

    Article  MATH  MathSciNet  Google Scholar 

  • GALLEGOS, M.T., and RITTER, G. (2005) “A Robust Method for Cluster Analysis”, Annals of Statistics 33: 347–380.

    Article  MATH  MathSciNet  Google Scholar 

  • GARCIA-ESCUDERO, L.A., and GORDALIZA, A. (1999), “Robustness Properties of k Means and Trimmed k Means”, Journal of the American Statistical Association 94: 956–969.

    Article  MATH  MathSciNet  Google Scholar 

  • GARCIA-ESCUDERO, L.A., GORDALIZA, A., and MATRÀN, C. (2003), “Trimming Tools in Exploratory Data Analysis”, Journal of Computational and Graphical Statistics 12: 434–449.

    Article  MathSciNet  Google Scholar 

  • GOLDSTEIN, D., GHOSH, D., and CONLON, E. (2002), “Statistical Issues in the Clustering of Gene Expression Data”, Statistica Sinica 12: 219–241.

    MATH  MathSciNet  Google Scholar 

  • HAMPEL, F.R. (1971), “A General Qualitative Definition of Robustness”, Annals of Mathematical Statistics 42: 1887–1896.

    Article  MATH  MathSciNet  Google Scholar 

  • HAMPEL, F.R., ROUSSEEUW, P.J., RONCHETTI, E., and STAHEL,W.A. (1986), Robust Statistics: The Approach Based on the Influence Function, New York: Wiley.

    MATH  Google Scholar 

  • HARDIN, J., and ROCKE, D. (2004), “Outlier Detection in the Multiple Cluster Setting Using the Minimum Covariance Determinant Estimator”, Computational Statistics and Data Analysis 44: 625–638.

    Article  MathSciNet  Google Scholar 

  • HARTIGAN, J.A. (1972), “Direct Clustering of a Data Matrix”, Journal of the American Statistical Association 67: 123–129.

    Article  Google Scholar 

  • HODGES, J.L. Jr. (1967), “Efficiency in Normal Samples and Tolerance of Extreme Values for Some Estimates of Location”, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1), Berkeley CA: Univ. California Press, pp. 163–186.

    Google Scholar 

  • HUBER, P.J. (1981), Robust Statistics, New York: Wiley.

    Book  MATH  Google Scholar 

  • HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification 2: 193–218.

    Article  Google Scholar 

  • KAUFMAN, L., and ROUSSEEUW, P.J. (1990), Finding Groups in Data, NewYork: Wiley.

    Book  Google Scholar 

  • MADEIRA, S.C., and OLIVEIRA, A.L. (2004), “Biclustering Algorithms for Biological Data Analysis: A Survey”, IEEE/ACM Transactions on Computational Biology and Bioinformatics 1: 24–45.

    Article  Google Scholar 

  • ROCCI, R., and VICHI,M. (2008), “Two-Mode Multi-Partitioning”, Computational Statistics and Data Analysis 52: 1984–2003.

    Article  MathSciNet  Google Scholar 

  • ROUSSEEUW, P.J. (1984), “Least Median of Squares Regression”, Journal of the American Statistical Association 79: 851–857.

    Article  MathSciNet  Google Scholar 

  • ROUSSEEUW, P.J., and VAN DRIESSEN, K. (1999), “A Fast Algorithm for the Minimum Covariance Determinant Estimator”, Technometrics 41: 212–223.

    Article  Google Scholar 

  • ROUSSEEUW, P.J., and VAN DRIESSEN, K. (2006), “Computing LTS Regression for Large Data Sets”, Data mining and knowledge discovery 12: 29–45.

    Article  MathSciNet  Google Scholar 

  • SCHEPERS, J., CEULEMANS, E., and VAN MECHELEN, I. (2008), “Selecting among Multi-Mode Partitioning Models of Different Complexities: A Comparison of Four Model Selection Criteria”, Journal of Classification 25: 67–85.

    Article  Google Scholar 

  • VAN MECHELEN, I., BOCK, H.H., and DE BOECK, P. (2004), “Two-Mode Clustering Methods: A Structured Overview”, Statistical Methods in Medical Research 13: 363–394.

    Article  MATH  MathSciNet  Google Scholar 

  • VICHI, M. (2000), “Double k-means Clustering for Simultaneous Classification of Objects and Variables”, in Advances in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization, edd. S. Borra, R. Rocci, and M. Schader, Heidelberg: Springer, 43–52.

    Google Scholar 

  • ZEWOTIR, T., and GALPIN, J.S. (2007), “A Unified Approach on Residuals, Leverages and Outliers in the Linear Mixed Model”, Test 16: 58–75.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessio Farcomeni.

Additional information

The author is grateful to four referees for detailed suggestions that led to an improved paper, and to Professor Vichi for support and careful reading of a first draft. Acknowledgements go also to Francesca Martella for advice.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Farcomeni, A. Robust Double Clustering: A Method Based on Alternating Concentration Steps. J Classif 26, 77–101 (2009). https://doi.org/10.1007/s00357-009-9026-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-009-9026-z

Keywords

Navigation