Skip to main content
Log in

Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

The paper presents a least squares framework for divisive clustering. Two popular divisive clustering methods, Bisecting K-Means and Principal Direction Division, appear to be versions of the same least squares approach. The PDD recently has been enhanced with a stopping criterion taking into account the minima of the corresponding one-dimensional density function (dePDDP method). We extend this approach to Bisecting K-Means by projecting the data onto random directions and compare thus modified methods. It appears the dePDDP method is superior at datasets with relatively small numbers of clusters, whatever cluster intermix, whereas our version of Bisecting K-Means is superior at greater cluster numbers with noise entities added to the cluster structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • ALBATINEH, A.N., NIEWIADOMSKA-BUGAJ, M., and MIHALKO, D. (2006), “On Similarity Indices and Correction for Chance Agreement”, Journal of Classification, 23, 301–313.

    Article  MathSciNet  Google Scholar 

  • BOCK, H.H. (1996), “Probability Models and Hypothesis Testing in Partitioning Cluster Analysis”, in Clustering and Classification, eds. P. Arabie, C.D. Carroll and G. De Soete, River Edge NJ: World Scientific Publishing, pp. 377–453.

  • BOLEY, D. (1998), “Principal Direction Divisive Partitioning”, Data Mining and Knowledge Discovery, 2(4), 325–344.

  • FENG, Y., and HAMERLY, G. (2006), “PG-Means: Learning the Number of Clusters in Data”, in Advances in Neural Information Processing Systems, 19 (NIPS 2006), eds. B. Schölkopf, J.C. Platt and T. Hoffman, MIT Press, pp. 393–400.

  • FRAYLEY, C., and RAFTERY, A. (1998), “How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis”, The Computer Journal, 41(8), 578–588.

    Article  Google Scholar 

  • DASGUPTA, S. (1999), “Learning Mixtures of Gaussians”, IEEE Symposium on Foundations of Computer Science, 634–644.

  • DASGUPTA, S. (2000), “Experiments with Random Projection”, in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-2000), San Francisco: Morgan Kaufmann, p. 143–151.

  • DEMPSTER, A.P., LAIRD, N.M., and RUBIN, D.B. (1977), “Maximum Likelihood from Incomplete Data Via the EM Algorithm”, Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.

    MATH  MathSciNet  Google Scholar 

  • EDWARDS, A.W.F., and CAVALLI-SFORZA, L.L. (1965), “A Method for Cluster Analysis”, Biometrics, 21, 362–375.

    Article  Google Scholar 

  • FISHER, D.W. (1987), “Knowledge Acquisition Via Incremental Conceptual Clustering”, Machine Learning, 2, 139–172.

    Google Scholar 

  • GOWER, J.C. (1967), “A Comparison of Some Methods of Cluster Analysis”, Biometrics, 23, 623–637.

    Article  Google Scholar 

  • HAZMAN, M., EL-BELTAGY, S.R., and RAFEA, A. (2011), “A Survey of Ontology Learning Approaches”, International Journal of Computer Applications, 22(9), 36–43.

  • HUBERT, L.J., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • JOLLIFFE, I.T. (2002), Principal Component Analysis (2nd ed.), Springer Series in Statistics, New York: Springer.

  • JUNG, Y., PARK, H., DING-ZHU, D., and BARRY, L. (2003), “A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering”, Journal of Global Optimization, 25, 91–111.

    Article  Google Scholar 

  • MEILA, M. (2007), "Comparing Clusterings—An Information Based Distance", Journal of Multivariate Analysis, 98(5), 873–895.

  • MENDELL, R., RUBIN, D., and LO, Y. (2001), “Testing the Number of Components in a Normal Mixture”, Biometrika, 88(3), 767–778.

  • MICHALSKI, R.S., and STEPP, R.E. (1983), “Learning from Observation: Conceptual Clustering”, in Machine Learning: An Artificial Intelligence Approach, eds. R.S.

  • Michalski, J.G. Carbonell, T.M. Mitchell, San Mateo CA: Morgan Kauffmann, pp. 331–363.

  • MILLIGAN, G.W. (1996), “Clustering Validation: Results and Implications for Applied Analyses”, in Clustering and Classification, eds. P. Arabie, C.D. Carroll and G. De Soete, River Edge NJ: World Scientific Publishing, pp. 341–375.

  • MIRKIN, B. (1996), Mathematical Classification and Clustering, Dordrecht: Kluwer. MIRKIN, B. (2011), “Choosing the Number of Clusters”, WIRE Data Mining and Knowledge Discovery, 1, 252–260.

  • MIRKIN, B. (2012), Clustering: A Data Recovery Approach, London: CRC Press/Chapman and Hall.

  • MIRKIN, B., and MING-TSO CHIANG, M. (2010), “Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads”, Journal of Classification, 27, 3–40.

  • NEWMAN, M.E.J. (2006), “Modularity and Community Structure in Networks”, PNAS, 103(23), 8577–8582.

    Article  Google Scholar 

  • NG, A.Y., JORDAN, M.I., and WEISS, Y. (2001), “On Spectral Clustering: Analysis and an Algorithm”, Advances in Neural Information Processing Systems, 2, 849–856.

  • RAND, W.M. (1971), “Objective Criteria for the Evaluation of Clustering Methods”, Journal of the American Statistical Association, 66, 846–850.

    Article  Google Scholar 

  • SCHREIDER, Y.A., and SHAROV, A.A. (1982), Systems and Models (in Russian), Moscow: Radio i Sviaz'.

  • SHI, J., and MALIK, J. (2000), “Normalized Cuts and Image Segmentation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.

    Article  Google Scholar 

  • SNEATH, P.H.A., and SOKAL, R.R. (1973), Numerical Taxonomy, San Francisco: W.H. Freeman. SONQUIST J.A., BAKER E.L., and MORGAN J.N. (1973), Searching for Structure, Ann Arbor: Institute for Social Research, University of Michigan.

  • STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000), “A Comparison of Document Clustering Techniques”, KDD Workshop on Text Mining, 400(1), 525–526.

    Google Scholar 

  • STEINLEY, D., and BRUSCO, M. (2007), “Initializing K-Means Batch Clustering: A Critical Evaluation of Several Techniques”, Journal of Classification, 24, 99–121.

    Article  MATH  MathSciNet  Google Scholar 

  • TASOULIS, S.K., and TASOULIS, D.K. (2008), “Improving Principal Direction Divisive Clustering”, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), Workshop on Data Mining using Matrices and Tensors.

  • TASOULIS, S.K., TASOULIS, D.K., and PLAGIANAKOS, V.P. (2010), “Enhancing Principal Direction Divisive Clustering”, Pattern Recognition, 43, 3391–3411.

    Article  MATH  Google Scholar 

  • TASOULIS, S. K., TASOULIS, D. K., and PLAGIANAKOS, V.P. (2013), “Random Direction Divisive Clustering”, Pattern Recognition Letters, 34(2),131–139.

    Article  Google Scholar 

  • TEICHER, H. (1960), “On the Mixture of Distributions”, Annals of Mathematical Statististics, 31(1), 55–73.

  • VEMPALA, S. (2005), The Random Projection Method, DIMACS Series in Discrete Mathematics (Vol. 65), American Mathematical Society.

  • YEUNG, K.Y., FRALEY, C., MURUA, A., RAFTERY, A.E., and RUZZO, W.L. (2001), “Model-Based Clustering and Data Transformations for Gene Expression Data”, Bioinformatics, 17(10), 977–987.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ekaterina V. Kovaleva.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kovaleva, E.V., Mirkin, B.G. Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison. J Classif 32, 414–442 (2015). https://doi.org/10.1007/s00357-015-9186-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-015-9186-y

Keywords

Navigation