Skip to main content

Advertisement

Springer Nature Link
Log in
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Information Retrieval
  3. Article

Data Driven Similarity Measures for k-Means Like Clustering Algorithms

  • Published: April 2005
  • Volume 8, pages 331–349, (2005)
  • Cite this article
Download PDF
Information Retrieval Aims and scope Submit manuscript
Data Driven Similarity Measures for k-Means Like Clustering Algorithms
Download PDF
  • Jacob Kogan1,
  • Marc Teboulle2 &
  • Charles Nicholas3 
  • 308 Accesses

  • 14 Citations

  • Explore all metrics

Abstract

We present an optimization approach that generates k-means like clustering algorithms. The batch k-means and the incremental k-means are two well known versions of the classical k-means clustering algorithm (Duda et al. 2000). To benefit from the speed of the batch version and the accuracy of the incremental version we combine the two in a “ping–pong” fashion. We use a distance-like function that combines the squared Euclidean distance with relative entropy. In the extreme cases our algorithm recovers the classical k-means clustering algorithm and generalizes the Divisive Information Theoretic clustering algorithm recently reported independently by Berkhin and Becher (2002) and Dhillon1 et al. (2002). Results of numerical experiments that demonstrate the viability of our approach are reported.

Article PDF

Download to read the full article text

Similar content being viewed by others

Even Faster Exact k-Means Clustering

Chapter © 2020

Algorithms of Combinatorial Cluster Analysis

Chapter © 2018

A Quality Metric for K-Means Clustering Based on Centroid Locations

Chapter © 2022

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.
  • Algorithms
  • Continuous Optimization
  • Data Mining
  • Functional clustering
  • Learning algorithms
  • Machine Learning
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  • Auslender A, Teboulle M and Ben–Tiba S (1999) Interior proximal amd multiplier methods based on second order homogeneous kernels. Mathematics of Operations Research, 24:645–668.

    Google Scholar 

  • Berkhin P and Becher JD (2002) Learning simple relations: Theory and applications. In: Proceedings of the Second SIAM International Conference on Data Mining.

  • Berry M and Browne M (1999) Understanding Search Engines. SIAM.

  • Bertsekas DP and Tsitsiklis JN (1989) Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, New Jersey.

    Google Scholar 

  • Dempster A, Laird N and Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39.

  • Dhillon IS, Guan Y and Kogan J (2002) Refining clusters in high-dimensional text data. In: Proceedings of the Workshop on Clustering High Dimensional Data and its Applications (held in conjunction with the Second SIAM International Conference on Data Mining).

  • Dhillon, IS, Kogan J and Nicholas C (2003) Feature selection and document clustering, In Berry MW Ed. A Comprehensive Survey of Text Mining, pp. 73–100.

  • Dhillon IS and Modha DS (2001) Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1):143–175.

    Google Scholar 

  • Duda RO, Hart PE and Stork DG (2000) Pattern Classification. John Wiley & Sons.

  • Forgy E (1965) Cluster analysis of multivariate Data: Efficiency vs. interpretability of classifications. Biometrics, 21(3):768.

    Google Scholar 

  • Inderjit S. Dhillon Subramanyam Mallela and Rahul Kumar (2002) Enhanced word clustering for hierarchical text classification. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD-2002), pp. 191–200.

  • Kogan J (2001) Clustering large unstructured document sets. In: Berry MW Ed. Computational Information Retrieval, pp. 107–117.

  • Kogan J (2001) Means clustering for text data. In: Proceedings of the Workshop on Text Mining at the First SIAM International Conference on Data Mining, pp. 47–54.

  • Kogan J, Teboulle M and Nicholas C (2003) Optimization approach to generating families of k-means like algorithms. In: Proceedings of the Workshop on Clustering High Dimensional Data and its Applications (held in conjunction with the Third SIAM International Conference on Data Mining).

  • Kogan J, Teboulle M and Nicholas C (2003) The entropic geometric means algorithm: An approach for building small clusters for large text datasets. In: Proceedings of the Workshop on Clustering Large Data Sets (held in conjunction with the Third IEEE International Conference on Data Mining), pp. 63–71.

  • Lei Xu and Michael I. Jordan (1995) On convergence properties of the EM Algorithm for Gaussian Mixtures. MIT A.I. Memo No. 1520, C.B.C.L. paper No. 111.

  • Rockafellar RT (1970) Convex Analysis Princeton University Press, Princeton, NJ.

    Google Scholar 

  • Teboulle M (1992) On ϕ-divergence and its applications. In: Phillips FY and Rousseau J Eds. Systems and Management Science by Extremal Methods–Research Honoring Abraham Charnes at Age 70, Kluwer Academic Publishers, pp. 255–273.

  • Teboulle M (1997) Convergence of proximal-like algorithms. SIAM J. of Optimization, 7:1069-1083.

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Department of Mathematics and Statistics, UMBC, Baltimore, MD, 21250

    Jacob Kogan

  2. School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv, Israel

    Marc Teboulle

  3. Department of Computer Science and Electrical Engineering, UMBC, Baltimore, MD, 21250

    Charles Nicholas

Authors
  1. Jacob Kogan
    View author publications

    You can also search for this author inPubMed Google Scholar

  2. Marc Teboulle
    View author publications

    You can also search for this author inPubMed Google Scholar

  3. Charles Nicholas
    View author publications

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jacob Kogan.

Additional information

This research was supported in part by the US Department of Defense, the United States–Israel Binational Science Foundation (BSF), and Northrop Grumman Mission Systems (NG/MS).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kogan, J., Teboulle, M. & Nicholas, C. Data Driven Similarity Measures for k-Means Like Clustering Algorithms. Inf Retrieval 8, 331–349 (2005). https://doi.org/10.1007/s10791-005-5666-8

Download citation

  • Issue Date: April 2005

  • DOI: https://doi.org/10.1007/s10791-005-5666-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • clustering algorithms
  • optimization
  • entropy
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

18.222.89.230

Not affiliated

Springer Nature

© 2025 Springer Nature