Abstract
Data objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich semantic information that may indicate important relationships among objects, such as the similarities between objects. In this chapter we explore linkage-based clustering, in which the similarity between two objects is measured based on the similarities between the objects linked with them. We study a hierarchical structure called SimTree, which represents similarities in multi-granularity manner. This method avoids the high cost of computing and storing pairwise similarities but still thoroughly explore relationships among objects. We introduce an efficient algorithm for computing similarities utilizing the SimTree.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Here conferences refer to conferences, journals, and workshops. We are only interested in productive authors and well-known conferences because it is easier to determine the research fields related to each of them, from which the accuracy of clustering will be judged.
- 2.
Since no frequent patterns of conferences can be found using the proceedings linked to them, LinkClus uses authors linked with conferences to find frequent patterns of conferences, in order to build the initial SimTree for conferences.
- 3.
We do not test SimRank and F-SimRank on large databases because they consume too much memory.
References
C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park. Fast algorithms for projected clustering. In SIGMOD, Philadelphia, PA, 1999.
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, Washington, DC, 1993.
Y. Bartal. On approximating arbitrary metrics by tree metrics. In STOC, Dallas, TX, 1998.
R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML, Bonn, Germany, 2005.
D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, Seattle, WA, 2004.
Y. Cheng and G. M. Church. Biclustering of expression data. In ISMB, La Jolla, CA, 2000.
DBLP Bibliography. www.informatik.uni-trier.de/∼ley/db/
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In KDD, Washington, DC, 2003.
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. In SIGCOMM, Cambridge, MA, 1999.
D. Fogaras and B. Rácz. Scaling link-base similarity search. In WWW, Chiba, Japan, 2005.
S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large databases. In SIGMOD, Seattle, WA, 1998.
J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining top-k frequent closed patterns without minimum support. In ICDM, Maebashi City, Japan, 2002.
G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. In KDD, Edmonton, Canada, 2002.
M. Kirsten and S. Wrobel. Relational distance-based clustering. In ILP, Madison, WI, 1998.
J. MacQueen. Some methods for classification and analysis of multivariate observations. In Berkeley Symposium, Berkeley, CA, 1967.
R. T. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. In VLDB, Santiago de Chile, Chile, 1994.
R. Sibson. SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1):30–34, 1973.
P.-N. Tan, M. Steinbach, and W. Kumar. Introdution to data mining. Addison-Wesley, New York, NY 2005.
J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In KDD, Washington, DC, 2003.
J. D. Wang, H. J. Zeng, Z. Chen, H. J. Lu, L. Tao, and W. Y. Ma. ReCoM: Reinforcement clustering of multi-type interrelated data objects. In SIGIR, Toronto, Canada, 2003.
X. Yin, J. Han, and P. S. Yu. Cross-relational clustering with user’s guidance. In KDD, Chicago, IL, 2005.
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In SIGMOD, Montreal, Canada, 1996.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Yin, X., Han, J., Yu, P.S. (2010). Scalable Link-Based Similarity Computation and Clustering. In: Yu, P., Han, J., Faloutsos, C. (eds) Link Mining: Models, Algorithms, and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6515-8_2
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6515-8_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6514-1
Online ISBN: 978-1-4419-6515-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)