Definition
Sublinear clustering describes the process of clustering a given set of input objects using only a small subset of the input set, which is typically selected by a random process. A solution computed by a sublinear clustering algorithm is an implicit description of the clustering (rather than a partition of the input objects), for example in the form of cluster centers. Sublinear clustering is usually applied when the input set is too large to be processed with standard clustering algorithms.
Motivation and Background
Clusteringis the process of partitioning a set of objects into subsets of similar objects. In machine learning, it is, for example, used in unsupervised learning to fit input data to a density model. In many modern applications of clustering, the input sets consist of billions of objects to be clustered. Typical examples include web search, analysis of web traffic, and spam detection. Therefore, even though many relatively efficient clustering algorithms are...
Recommended Reading
Alon, N., Dar, S., Parnas, M., & Ron, D. (2003). Testing of clustering. SIAM Journal on Discrete Mathematics, 16(3), 393–417.
Bădoiu, M., Har-Peled, S., & Indyk, P. (2002). Approximate clustering via core-sets. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), (pp. 250–257).
Ben-David, S. (2004). A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In Proceedings of the 17th Annual Conference on Learning Theory (COLT), (pp. 415–426).
Chen, K. (2006). On k-median clustering in high dimensions. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), (pp. 1177–1185).
Czumaj, A., & Sohler, C. (2007). Sublinear-time approximation for clustering via random sampling. Random Structures & Algorithms, 30(1–2), 226–256.
Feldman, D., Monemizadeh, M., & Sohler, C. (2007). A PTAS for k-means clustering based on weak coresets. In Proceedings of the 23rd Annual ACM Symposium on Computational Geometry (SoCG), (pp. 11–18).
Frahling, G., & Sohler, C. (2006). A fast k-means implementation using coresets. In Proceedings of the 22nd Annual ACM Symposium on Computational Geometry (SoCG), (pp. 135–143).
Har-Peled, S. & Kushal, A. (2005). Smaller coresets for k-median and k-means clustering. In Proceedings of the 21st Annual ACM Symposium on Computational Geometry (SoCG), (pp. 126–134).
Har-Peled, S., & Mazumdar, S. (2004). On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), (pp. 291–300).
Meyerson, A., O’Callaghan, L., & Plotkin S.(July 2004). A k-median algorithm with running time independent of data size. Machine Learning, 56(1–3), (pp. 61–87).
Mishra, N., Oblinger, D., & Pitt, L. (2001). Sublinear time approximate clustering. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), (pp. 439–447).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Czumaj, A., Sohler, C. (2011). Sublinear Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_798
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_798
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering