Abstract
In this paper we present a new approach for clustering a data set for which the only information available is a similarity measure between every pair of elements. The objective is to partition the set into disjoint subsets such that two elements assigned to the same subset are more likely to have a high similarity measure than elements assigned to different subsets. The algorithm makes no assumption about the size or number of clusters, or of any constraint in the similarity measure. The algorithm relies on very simple operations. The running time is dominated by matrix multiplication, and in some cases curve-fitting. We will present experimental results from various implementations of this method.
This work partially supported by NSF Grants EIA-98-02068 and BCS-99-78116. Research supported by NSF Career Award CCR-9624828, a Dartmouth Fellowship, and NSF Grant EIA-98002068. Research partially supported by NSF Career Award CCR-9624828, NSF Grant EIA-98-02068, a Dartmouth Fellowship, and an Alfred P. Sloane Foundation Fellowship.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.
A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expression patterns. Journal of Computational Biology, 6(3/4), 1999.
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press/McGraw-Hill, 1990.
B. S. Everitt. Cluster Analysis. Oxford University Press, 1993.
P. G. Hoel, S. C. Port, and C. J. Stone. Introduction to Probability Theory. Houghton Mifflin, 1971.
L. Kaufman and P. J. Rousseeuw. Finding groups in data. John Wiley & Sons, Inc, 1990.
T. Kohonen. The self-organizing map. Proceedings of the IEEE, 78(9), September 1990.
B. G. Mirkin. Mathematical classffication and clustering. Kluwer Academic Publishers, 1996.
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes. Cambridge University Press, 1986.
G. Salton, A. Wong, and C. S. Yang. A vector space model for information retrieval. Communications of the ACM, 18(11):613–620, 1975.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aslam, J., Leblanc, A., Stein, C. (2001). Clustering Data without Prior Knowledge. In: Näher, S., Wagner, D. (eds) Algorithm Engineering. WAE 2000. Lecture Notes in Computer Science, vol 1982. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44691-5_7
Download citation
DOI: https://doi.org/10.1007/3-540-44691-5_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42512-0
Online ISBN: 978-3-540-44691-0
eBook Packages: Springer Book Archive