Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison

Kovaleva, Ekaterina V.; Mirkin, Boris G.

doi:10.1007/s00357-015-9186-y

Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison

Published: 02 October 2015

Volume 32, pages 414–442, (2015)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Ekaterina V. Kovaleva¹ &
Boris G. Mirkin^1,2

216 Accesses
15 Citations
Explore all metrics

Abstract

The paper presents a least squares framework for divisive clustering. Two popular divisive clustering methods, Bisecting K-Means and Principal Direction Division, appear to be versions of the same least squares approach. The PDD recently has been enhanced with a stopping criterion taking into account the minima of the corresponding one-dimensional density function (dePDDP method). We extend this approach to Bisecting K-Means by projecting the data onto random directions and compare thus modified methods. It appears the dePDDP method is superior at datasets with relatively small numbers of clusters, whatever cluster intermix, whereas our version of Bisecting K-Means is superior at greater cluster numbers with noise entities added to the cluster structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

References

ALBATINEH, A.N., NIEWIADOMSKA-BUGAJ, M., and MIHALKO, D. (2006), “On Similarity Indices and Correction for Chance Agreement”, Journal of Classification, 23, 301–313.
Article MathSciNet Google Scholar
BOCK, H.H. (1996), “Probability Models and Hypothesis Testing in Partitioning Cluster Analysis”, in Clustering and Classification, eds. P. Arabie, C.D. Carroll and G. De Soete, River Edge NJ: World Scientific Publishing, pp. 377–453.
BOLEY, D. (1998), “Principal Direction Divisive Partitioning”, Data Mining and Knowledge Discovery, 2(4), 325–344.
FENG, Y., and HAMERLY, G. (2006), “PG-Means: Learning the Number of Clusters in Data”, in Advances in Neural Information Processing Systems, 19 (NIPS 2006), eds. B. Schölkopf, J.C. Platt and T. Hoffman, MIT Press, pp. 393–400.
FRAYLEY, C., and RAFTERY, A. (1998), “How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis”, The Computer Journal, 41(8), 578–588.
Article Google Scholar
DASGUPTA, S. (1999), “Learning Mixtures of Gaussians”, IEEE Symposium on Foundations of Computer Science, 634–644.
DASGUPTA, S. (2000), “Experiments with Random Projection”, in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-2000), San Francisco: Morgan Kaufmann, p. 143–151.
DEMPSTER, A.P., LAIRD, N.M., and RUBIN, D.B. (1977), “Maximum Likelihood from Incomplete Data Via the EM Algorithm”, Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.
MATH MathSciNet Google Scholar
EDWARDS, A.W.F., and CAVALLI-SFORZA, L.L. (1965), “A Method for Cluster Analysis”, Biometrics, 21, 362–375.
Article Google Scholar
FISHER, D.W. (1987), “Knowledge Acquisition Via Incremental Conceptual Clustering”, Machine Learning, 2, 139–172.
Google Scholar
GOWER, J.C. (1967), “A Comparison of Some Methods of Cluster Analysis”, Biometrics, 23, 623–637.
Article Google Scholar
HAZMAN, M., EL-BELTAGY, S.R., and RAFEA, A. (2011), “A Survey of Ontology Learning Approaches”, International Journal of Computer Applications, 22(9), 36–43.
HUBERT, L.J., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2, 193–218.
Article Google Scholar
JOLLIFFE, I.T. (2002), Principal Component Analysis (2nd ed.), Springer Series in Statistics, New York: Springer.
JUNG, Y., PARK, H., DING-ZHU, D., and BARRY, L. (2003), “A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering”, Journal of Global Optimization, 25, 91–111.
Article Google Scholar
MEILA, M. (2007), "Comparing Clusterings—An Information Based Distance", Journal of Multivariate Analysis, 98(5), 873–895.
MENDELL, R., RUBIN, D., and LO, Y. (2001), “Testing the Number of Components in a Normal Mixture”, Biometrika, 88(3), 767–778.
MICHALSKI, R.S., and STEPP, R.E. (1983), “Learning from Observation: Conceptual Clustering”, in Machine Learning: An Artificial Intelligence Approach, eds. R.S.
Michalski, J.G. Carbonell, T.M. Mitchell, San Mateo CA: Morgan Kauffmann, pp. 331–363.
MILLIGAN, G.W. (1996), “Clustering Validation: Results and Implications for Applied Analyses”, in Clustering and Classification, eds. P. Arabie, C.D. Carroll and G. De Soete, River Edge NJ: World Scientific Publishing, pp. 341–375.
MIRKIN, B. (1996), Mathematical Classification and Clustering, Dordrecht: Kluwer. MIRKIN, B. (2011), “Choosing the Number of Clusters”, WIRE Data Mining and Knowledge Discovery, 1, 252–260.
MIRKIN, B. (2012), Clustering: A Data Recovery Approach, London: CRC Press/Chapman and Hall.
MIRKIN, B., and MING-TSO CHIANG, M. (2010), “Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads”, Journal of Classification, 27, 3–40.
NEWMAN, M.E.J. (2006), “Modularity and Community Structure in Networks”, PNAS, 103(23), 8577–8582.
Article Google Scholar
NG, A.Y., JORDAN, M.I., and WEISS, Y. (2001), “On Spectral Clustering: Analysis and an Algorithm”, Advances in Neural Information Processing Systems, 2, 849–856.
RAND, W.M. (1971), “Objective Criteria for the Evaluation of Clustering Methods”, Journal of the American Statistical Association, 66, 846–850.
Article Google Scholar
SCHREIDER, Y.A., and SHAROV, A.A. (1982), Systems and Models (in Russian), Moscow: Radio i Sviaz'.
SHI, J., and MALIK, J. (2000), “Normalized Cuts and Image Segmentation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Article Google Scholar
SNEATH, P.H.A., and SOKAL, R.R. (1973), Numerical Taxonomy, San Francisco: W.H. Freeman. SONQUIST J.A., BAKER E.L., and MORGAN J.N. (1973), Searching for Structure, Ann Arbor: Institute for Social Research, University of Michigan.
STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000), “A Comparison of Document Clustering Techniques”, KDD Workshop on Text Mining, 400(1), 525–526.
Google Scholar
STEINLEY, D., and BRUSCO, M. (2007), “Initializing K-Means Batch Clustering: A Critical Evaluation of Several Techniques”, Journal of Classification, 24, 99–121.
Article MATH MathSciNet Google Scholar
TASOULIS, S.K., and TASOULIS, D.K. (2008), “Improving Principal Direction Divisive Clustering”, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), Workshop on Data Mining using Matrices and Tensors.
TASOULIS, S.K., TASOULIS, D.K., and PLAGIANAKOS, V.P. (2010), “Enhancing Principal Direction Divisive Clustering”, Pattern Recognition, 43, 3391–3411.
Article MATH Google Scholar
TASOULIS, S. K., TASOULIS, D. K., and PLAGIANAKOS, V.P. (2013), “Random Direction Divisive Clustering”, Pattern Recognition Letters, 34(2),131–139.
Article Google Scholar
TEICHER, H. (1960), “On the Mixture of Distributions”, Annals of Mathematical Statististics, 31(1), 55–73.
VEMPALA, S. (2005), The Random Projection Method, DIMACS Series in Discrete Mathematics (Vol. 65), American Mathematical Society.
YEUNG, K.Y., FRALEY, C., MURUA, A., RAFTERY, A.E., and RUZZO, W.L. (2001), “Model-Based Clustering and Data Transformations for Gene Expression Data”, Bioinformatics, 17(10), 977–987.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Data Analysis and Machine Intelligence, National Research University Higher School of Economics, 20 Miasnitskaya, Moscow, RF, Russia
Ekaterina V. Kovaleva & Boris G. Mirkin
Department of Computer Science and Information Systems, Birkbeck University of London, London, UK
Boris G. Mirkin

Authors

Ekaterina V. Kovaleva
View author publications
You can also search for this author in PubMed Google Scholar
Boris G. Mirkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ekaterina V. Kovaleva.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kovaleva, E.V., Mirkin, B.G. Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison. J Classif 32, 414–442 (2015). https://doi.org/10.1007/s00357-015-9186-y

Download citation

Published: 02 October 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s00357-015-9186-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation