Skip to main content

Sublinear Clustering

  • Reference work entry
Encyclopedia of Machine Learning
  • 204 Accesses

Definition

Sublinear clustering describes the process of clustering a given set of input objects using only a small subset of the input set, which is typically selected by a random process. A solution computed by a sublinear clustering algorithm is an implicit description of the clustering (rather than a partition of the input objects), for example in the form of cluster centers. Sublinear clustering is usually applied when the input set is too large to be processed with standard clustering algorithms.

Motivation and Background

Clusteringis the process of partitioning a set of objects into subsets of similar objects. In machine learning, it is, for example, used in unsupervised learning to fit input data to a density model. In many modern applications of clustering, the input sets consist of billions of objects to be clustered. Typical examples include web search, analysis of web traffic, and spam detection. Therefore, even though many relatively efficient clustering algorithms are...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  • Alon, N., Dar, S., Parnas, M., & Ron, D. (2003). Testing of clustering. SIAM Journal on Discrete Mathematics, 16(3), 393–417.

    Article  MathSciNet  MATH  Google Scholar 

  • Bădoiu, M., Har-Peled, S., & Indyk, P. (2002). Approximate clustering via core-sets. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), (pp. 250–257).

    Google Scholar 

  • Ben-David, S. (2004). A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In Proceedings of the 17th Annual Conference on Learning Theory (COLT), (pp. 415–426).

    Google Scholar 

  • Chen, K. (2006). On k-median clustering in high dimensions. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), (pp. 1177–1185).

    Chapter  Google Scholar 

  • Czumaj, A., & Sohler, C. (2007). Sublinear-time approximation for clustering via random sampling. Random Structures & Algorithms, 30(1–2), 226–256.

    Article  MathSciNet  MATH  Google Scholar 

  • Feldman, D., Monemizadeh, M., & Sohler, C. (2007). A PTAS for k-means clustering based on weak coresets. In Proceedings of the 23rd Annual ACM Symposium on Computational Geometry (SoCG), (pp. 11–18).

    Google Scholar 

  • Frahling, G., & Sohler, C. (2006). A fast k-means implementation using coresets. In Proceedings of the 22nd Annual ACM Symposium on Computational Geometry (SoCG), (pp. 135–143).

    Google Scholar 

  • Har-Peled, S. & Kushal, A. (2005). Smaller coresets for k-median and k-means clustering. In Proceedings of the 21st Annual ACM Symposium on Computational Geometry (SoCG), (pp. 126–134).

    Google Scholar 

  • Har-Peled, S., & Mazumdar, S. (2004). On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), (pp. 291–300).

    Google Scholar 

  • Meyerson, A., O’Callaghan, L., & Plotkin S.(July 2004). A k-median algorithm with running time independent of data size. Machine Learning, 56(1–3), (pp. 61–87).

    Article  MATH  Google Scholar 

  • Mishra, N., Oblinger, D., & Pitt, L. (2001). Sublinear time approximate clustering. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), (pp. 439–447).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Czumaj, A., Sohler, C. (2011). Sublinear Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_798

Download citation

Publish with us

Policies and ethics