Calculating Similarity Efficiently in a Small World

Jia, Xu; Cai, Yuanzhe; Liu, Hongyan; He, Jun; Du, Xiaoyong

doi:10.1007/978-3-642-03348-3_19

Xu Jia^25,26,
Yuanzhe Cai^25,26,
Hongyan Liu²⁷,
Jun He^25,26 &
…
Xiaoyong Du^25,26

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5678))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

Abstract

SimRank is a well-known algorithm for similarity calculation based on link analysis. However, it suffers from high computational cost. It has been shown that the world web graph is a “small world graph”. In this paper, we observe that for this kind of small world graph, node pairs whose similarity scores are zero after first several iterations will remain zero in the final output. Based on this observation, we proposed a novel algorithm calledSW-SimRank to speed up similarity calculation by avoiding recalculating those unreachable pairs’ similarity scores. Our experimental results on web datasets showed the efficiency of our approach. The larger the proportion of unreachable pairs is in the relationship graph, the more improvement the SW-SimRank algorithm will achieve. In addition, SW-SimRank can be integrated with other SimRank acceleration methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

GSimRank: A General Similarity Measure on Heterogeneous Information Network

SimRank*: effective and scalable pairwise similarity search based on graph topology

Article Open access 11 January 2019

Fast computation of General SimRank on heterogeneous information network

Article Open access 21 May 2024

References

Jeh, G., Widom, J.: SimRank: A measure of structural-context similarity. In: SIGKDD, pp. 538–543 (2002)
Google Scholar
Small, H.: Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science 24(4), 265–269 (1973)
Article Google Scholar
Kessler, M.M.: Bibliographic coupling between scientific papers. American Documentation 14(1), 10–25 (1963)
Article Google Scholar
Amsler, R.: Applications of citation-based automatic classification. Linguistic Research Center (1972)
Google Scholar
Fogaras, D., Racz, B.: Scaling link-based similarity search. In: WWW, pp. 641–650 (2005)
Google Scholar
Yin, X.X., Han, J.W., Yu, P.S.: LinkClus: Efficient Clustering via Heterogeneous Semantic Links. In: VLDB, pp. 427–438 (2006)
Google Scholar
Dmitry, L., Pavel, V., Maxim, G., Denis, T.: Accuracy Estimate and Optimization Techniques for SimRank Computation. In: VLDB, pp. 422–433 (2008)
Google Scholar
Xi, W., Fox, E.A., Zhang, B., Cheng, Z.: SimFusion: Measuring Similarity Using Unified Relationship Matrix. In: SIGIR, pp. 130–137 (2005)
Google Scholar
Pool, I., Kochen, M.: Contacts and influence, Social Network (1978)
Google Scholar
Lada, A.A.: The Small World Web. In: Abiteboul, S., Vercoustre, A.-M. (eds.) ECDL 1999, vol. 1696, p. 443. Springer, Heidelberg (1999)
Chapter Google Scholar
Langville, A.N., Meyer, C.D.: Deeper Inside PageRank. Internet Mathematics 1(3), 335–400 (2004)
Article MathSciNet MATH Google Scholar
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph Structure in the Web. In: WWW (2000)
Google Scholar
CMU four university data set, http://www.cs.cmu.edu/afs/cs/project/theo-20/www/data/
Han, J.W., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing
Xu Jia, Yuanzhe Cai, Jun He & Xiaoyong Du
Department of Computer Science, Renmin University of China, 100872, Beijing
Xu Jia, Yuanzhe Cai, Jun He & Xiaoyong Du
Department of Management Science and Engineering, Tsinghua University, 100084, Beijing
Hongyan Liu

Authors

Xu Jia
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzhe Cai
View author publications
You can also search for this author in PubMed Google Scholar
Hongyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun He
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Science & Engineering Institute, School of Education Technology, Beijing Normal University, Xinjiekouwai Ave. 19, 100875, Beijing, China
Ronghuai Huang
The Hong Kong University of Science and Technology, Clear Water Bay,, Hong Kong, Hong Kong
Qiang Yang
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
Faculty of Economics, University of Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
João Gama
School of Information, Zhongguancum, Renmin University, 100872, Beijing, China
Xiaofeng Meng
School of Information Technology and Electrical Engineering, The University of Queensland, 4072, St. Lucia, Queensland, Australia
Xue Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jia, X., Cai, Y., Liu, H., He, J., Du, X. (2009). Calculating Similarity Efficiently in a Small World. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-03348-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03347-6
Online ISBN: 978-3-642-03348-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics