skip to main content
10.1145/1739041.1739098acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Fast computation of SimRank for static and dynamic information networks

Published: 22 March 2010 Publication History

Abstract

Information networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities. One of the most important aspects of information network analysis is to measure similarity between nodes in a network. SimRank is a simple and influential measure of this kind, based on a solid theoretical "random surfer" model. Existing work computes SimRank similarity scores in an iterative mode. We argue that the iterative method can be infeasible and inefficient when, as in many real-world scenarios, the networks change dynamically and frequently. We envision non-iterative method to bridge the gap. It allows users not only to update the similarity scores incrementally, but also to derive similarity scores for an arbitrary subset of nodes. To enable the non-iterative computation, we propose to rewrite the SimRank equation into a non-iterative form by using the Kronecker product and vectorization operators. Based on this, we develop a family of novel approximate SimRank computation algorithms for static and dynamic information networks, and give their corresponding theoretical justification and analysis. The non-iterative method supports efficient processing of various node analysis including similarity tracking and centrality tracking on evolving information networks. The effectiveness and efficiency of our proposed methods are evaluated on synthetic and real data sets.

References

[1]
M. E. J. Newman, "The structure and function of complex netwroks," SIAM Review, 2003.
[2]
X. Yan, P. S. Yu, and J. Han, "Substructure similarity search in graph databases," in Proc. Of ACM-SIGMOD Int'l Conference on Management of Data, 2005.
[3]
X. Yan and J. Han, "Closegraph: Mining closed frequent graph patterns," in Proc. of the 9th Int'l Conference on Knowledge discovery and data mining(KDD'03), 2003.
[4]
A. Ng, M. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an algorithm," in Proc. Of the Advances in Neural Information Processing Systems(NIPS), 2002.
[5]
M. Girvan and M. Newman, "Community structure in social and biological networks," in Proc. Of the National Academy of Sciences, 2002.
[6]
L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation ranking: Bringing order to the web," Technical report, Stanford University Database Group, http://citeseer.nj.nec.com/368196.html, 1998.
[7]
J. Kleinberg, "Authoritative sources in a hyperlinked environment," Journal of the ACM, 1999.
[8]
P. Ganesan, H. Garcia-molina, and J. Widom, "Exploiting hierarchical domain structure to compute similarity," ACM Transactions on Information Systems, vol. 21, pp. 64--93, 2003.
[9]
G. Jeh and J. Widom, "Simrank: a measure of structural-context similarity," in Proc. of the 8th Int'l Conference on Knowledge discovery and data mining(KDD'02), 2002.
[10]
Y. Koren, S. North, and C. Volinsky, "Measuring and extracting proximity in networks," in Proc. of the 12th Int'l Conference on Knowledge discovery and data mining(KDD'06), 2006.
[11]
C. Faloutsos, K. S. McCurley, and A. Tomkins, "Fast discovery of connection subgraphs," in Proc. of the 10th Int'l Conference on Knowledge discovery and data mining(KDD'04), 2004.
[12]
E. Leicht, P. Holme, and M. Newman, "Vertex similarity in networks," Phys. Rev., vol. 026120, p. E 73, 2006.
[13]
A. G. Maguitman, F. Menczer, F. Erdinc, H. Roinestad, and A. Vespignani, "Algorithmic computation and approximation of semantic similarity," in Proc. of the 15th Int'l Conference on World Wide Web (WWW'06), 2006.
[14]
D. Lizorkin, P. Velikhov, M. Grinev, and D. Turdakov, "Accuracy estimate and optimization techniques for simrank computation," in Proc. of the 34st Int'l Conference on Very Large Databases (VLDB'08), 2008.
[15]
D. Fogaras and B. Racz, "Scaling link-based similarity search," in Proc. of the 14th Int'l Conference on World Wide Web (WWW'05), 2005.
[16]
P. Benner, "Factorized solution of sylvester equations with applications in control," in Proc. of the 16th International Symposium on Mathematical Theory of Network and Systems (MTNS 2004), 2004.
[17]
A. J. Laub, Matrix Analysis for Scientists and Engineers. Society for Industrial and Applied Mathematics, 2004.
[18]
J. Pan, H. Yang, C. Faloutsos, and P. Duygulu, "Automatic multimedia cross-modal correlation discovery," in Proc. of the 9th Int'l Conference on Knowledge discovery and data mining(KDD'04), 2004.
[19]
H. Tong, C. Faloutsos, and J. Pan, "Fast random walk with restart and its application," in Proc. IEEE 2001 Int. Conf. Data Mining (ICDM'06), 2006.
[20]
H. Tong, S. Papadimitriou, P. S. Yu, and C. Faloutsos, "Proximity tracking on time-evolving bipartite graphs." in Proc. of SDM, 2008.
[21]
G. Golub and C. Loan, Matrix Computation. Johns Hopkins, 1996.
[22]
W. Piegorsch and G. Casella, "Inverting a sum of matrices," SIAM Rev., vol. 32, pp. 470--470, 1990.
[23]
M. Stoll, "A krylov-schur approach to the truncated svd," in NA Group technical reports, http://www.comlab.ox.ac.uk/files/721/NA-08-03.pdf, 2008.
[24]
L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation ranking: Bringing order to the web," Technical report, Stanford University Database Group, http://citeseer.nj.nec.com/368196.html, 1998.
[25]
J. Sun, Y. Xie, H. Zhang, and C. Faloutsos., "Less is more: Compact matrix decomposition for large sparse graphs," in Proc. of SDM, 2007.
[26]
M. W. Berry, S. T. Dumais, and G. W. O'brien, "Using linear algebra for intelligent information retrieval," SIAM Rev., vol. 37, pp. 573--595, 1995.
[27]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender., "Learning to rank using gradient descent," in Proc. 22th Int. Conf. Machine Learning (ICML'05), 2005.
[28]
K. Jarvelin and J. Kekalainen, "Cumulated gain-based evaluation of ir techniques," ACM Transactions on Information Systems, 2002.
[29]
L. Buriol, C. Castillo, D. Donato, S. Leonardi, and S. Millozzi, "Temporal analysis of the wikigraph," in Proceedings of the Web Intelligence Conference (WI 2006). Los Alamitos, CA, USA: IEEE Computer Society, December 2006, pp. 45--51. {Online}. Available: http://www.dcc.uchile.cl/ccastill/papers/buriol_2006_temporal_analysis_wikigraph.pdf
[30]
D. Lizorkin, P. Velikhov, M. Grinev, and D. Turdakov, "Accuracy estimate and optimization techniques for simrank computation." PVLDB, vol. 1, no. 1, pp. 422--433, 2008. {Online}. Available: http://dblp.uni-trier.de/db/journals/pvldb/pvldb1.html
[31]
W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang, "Simfusion: measuring similarity using unified relationship matrix," in Proc. Of the 28th international ACM SIGIR conference on Research and development in information retrieval, 2005.
[32]
C. Tantipathananandh, T. Y. Berger-Wolf, and D. Kempe, "A framework for community identification in dynamic social networks," in Proc. of the 13th Int'l Conference on Knowledge discovery and data mining(KDD'07), 2007.
[33]
L. Backstrom, D. Huttenlocher, and J. Kleinberg, "Group formation in large social networks: membership, growth, and evolution," in Proc. of the 12th Int'l Conference on Knowledge discovery and data mining(KDD'06), 2006.
[34]
J. Leskovec, J. M. Kleinberg, and C. Faloutsos, "Graphs over time: densification laws, shrinking diameters and possible explanations," in Proc. of the 13th Int'l Conference on Knowledge discovery and data mining(KDD'07), 2007.
[35]
J. Sun, D. Tao, and C. Faloutsos, "Beyond streams and graphs: dynamic tensor analysis," in Proc. of the 12th Int'l Conference on Knowledge discovery and data mining(KDD'06), 2006.
[36]
Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng, "Evolutionary spectral clustering by incorporating temporal smoothness," in Proc. of the 13th Int'l Conference on Knowledge discovery and data mining(KDD'07), 2007.
[37]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman., "Indexing by latent semantic analysis." in Journal of the Society for Information Science, 1990.
[38]
I. Jolliffe, "Principal component analysis," Springer, 2002.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology
March 2010
741 pages
ISBN:9781605589459
DOI:10.1145/1739041
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SimRank
  2. graph
  3. information network
  4. similarity measure

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT/ICDT '10
EDBT/ICDT '10: EDBT/ICDT '10 joint conference
March 22 - 26, 2010
Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)StructSim: Meta-Structure-Based Similarity Measure in Heterogeneous Information NetworksApplied Sciences10.3390/app1402093514:2(935)Online publication date: 22-Jan-2024
  • (2024)Link prediction based on spectral analysisPLOS ONE10.1371/journal.pone.028738519:1(e0287385)Online publication date: 2-Jan-2024
  • (2024)I-CoSim: Efficient Dynamic CoSimRank Retrieval on Evolving NetworksCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651523(923-926)Online publication date: 13-May-2024
  • (2024)SimRank on Data Streams2024 IEEE 31st International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)10.1109/HiPCW63042.2024.00046(137-138)Online publication date: 18-Dec-2024
  • (2024)Fast computation of General SimRank on heterogeneous information networkDiscover Computing10.1007/s10791-024-09438-527:1Online publication date: 21-May-2024
  • (2023)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 1-Dec-2023
  • (2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 4-Aug-2023
  • (2023)Everything Evolves in Personalized PageRankProceedings of the ACM Web Conference 202310.1145/3543507.3583474(3342-3352)Online publication date: 30-Apr-2023
  • (2023)All-Pairs SimRank Updates on Dynamic Graphs2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00050(131-138)Online publication date: 21-Dec-2023
  • (2023)Hierarchical All-Pairs SimRank CalculationDatabase Systems for Advanced Applications10.1007/978-3-031-30675-4_17(252-268)Online publication date: 15-Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media