Clustering Large Datasets Using Cobweb and K-Means in Tandem

Li, Mi; Holmes, Geoffrey; Pfahringer, Bernhard

doi:10.1007/978-3-540-30549-1_33

Clustering Large Datasets Using Cobweb and K-Means in Tandem

Mi Li²⁰,
Geoffrey Holmes²⁰ &
Bernhard Pfahringer²⁰

Conference paper

2687 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3339))

Abstract

This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods. The Cobweb algorithm is modified to produce a balanced tree with subclusters at the leaves, and then K-means is applied to the resulting subclusters. The resulting method, Scalable Cobweb, is then compared to a single pass K-means algorithm and standard K-means. The evaluation looks at error as measured by the sum of squared error and vulnerability to the order in which data points are processed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Data Bases, Irvine, CA. Department of Information and Computer Science. University of California (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bradley, P.S., Fayyad, U.M., Reina, C.A.: Scaling clustering algorithms to large databases. Microsoft Research, Technical Report, MSR-TR-98-37 (June 1998)
Google Scholar
Bradley, P.S., Fayyad, U.M., Reina, C.A.: Scaling EM(Expectation-Maximization) clustering to large databases. Microsoft Research, Technical Report, MSR-TR-98-35, Nov. 1998, Revised (October 1999)
Google Scholar
Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. SIGKDD Explorations Newsletter 2(1), 1–7 (2000)
Article Google Scholar
Fisher, D.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2(2), 139–172 (1987)
Google Scholar
Gennari, J.H., Langley, P., Fisher, D.: Models of incremental concept formation. Artificial Intelligence 40, 11–61 (1990)
Article Google Scholar
Gluck, M.A., Corter, J.E.: Information, uncertainty, and the utility of categories. In: Proceedings of the 7^th Annual Conference of the Cognitive Science Society, Irvine, CA (1985)
Google Scholar
Han, J., Kamber, M., Tung, A.K.H.: Spatial clustering methods in data mining: A survey. In: Miller, H., Han, J. (eds.) Geographic Data Mining and Knowledge Discovery, Taylor and Francis (2001)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review
Google Scholar
Kolatch, E.: Clustering algorithms for spatial databases: A survey. Dept. of Computer Science. Univ. of Maryland, College Park (2001); ACM Computing Surveys 31(3), 264–323 (September 1999)
Google Scholar
Ng, R., Han, J.: Efficient and effective clustering method for spatial data mining. In: Proceedings of the 20^th Very Large Databases Conference(VLDB 1994), Santiago, Chile, pp. 144–155 (1994)
Google Scholar
Ordonez, C.: Clustering binary data streams with k-means. In: 8^th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, California (June 2003)
Google Scholar
Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques with java implementations. Morgan Kaufmann Publishers, San Francisco (2000)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: SIGMOD 1996, pp. 103–114 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Waikato, Hamilton, New Zealand
Mi Li, Geoffrey Holmes & Bernhard Pfahringer

Authors

Mi Li
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey Holmes
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Pfahringer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Information Technology, Monash University, VIC 3800, Australia
Geoffrey I. Webb
Science, Engineering and Technology Portfolio, Royal Melbourne Institute of Technology, VIC 3001, Melbourne, Australia
Xinghuo Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M., Holmes, G., Pfahringer, B. (2004). Clustering Large Datasets Using Cobweb and K-Means in Tandem. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_33

Download citation

DOI: https://doi.org/10.1007/978-3-540-30549-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24059-4
Online ISBN: 978-3-540-30549-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics