Abstract
How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the “connection subgraphs”, personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic pre-computation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) block-wise, community-like structure. We exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman–Morrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in pre-computation and storage cost, and they achieve up to 150 × speed up with 90%+ quality preservation.
Similar content being viewed by others
References
Achlioptas D, McSherry F (2001) Fast computation of low rank matrix approximation. In: STOC
Aditya B, Bhalotia G, Chakrabarti S, Hulgeri A, Nakhe C, Parag SS (2002) Banks: Browsing and keyword searching in relational databases. In: VLDB, pp 1083–1086
Balmin A, Hristidis V, Papakonstantinou Y (2004) Objectrank: Authority-based keyword search in databases. In: VLDB, 564, 564–575
http://www.informatik.uni-trier.de/~ley/db/
Deerwester S, Dumais S, Landauer T, Furnas G and Harshman R (1990). Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6): 391–407
Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: The ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 03), Washington, DC, August 24–27
Faloutsos C, McCurley KS, Tomkins A (2004) Fast discovery of connection subgraphs. In: KDD, pp 118–127
Flake G, Lawrence S, Giles C (2000) Efficient identification of web communities. In: KDD, pp 150–160
Fogaras D, Racz B (2004) Towards scaling fully personalized pagerank. In: Proc. WAW, pp 105–117
Geerts F, Mannila H, Terzi E (2004) Relational link-based ranking. In: VLDB, pp 552–563
Girvan M, Newman MEJ (2002) Community structure is social and biological networks. Proc Natl Acad Sci 7821–7826
Golub G, Loan C (1996) Matrix computation. Johns Hopkins
Haveliwala TH (2002) Topic-sensitive pagerank. WWW, pp 517–526
He J, Li M, Zhang H, Tong H, Zhang C (2004) Manifold-ranking based image retrieval. In: ACM Multimedia, pp 9–16
Jeh G, Widom J (2002) Simrank: A measure of structural-context similarity. In: KDD, pp 538–543
Jeh G, Widom J (2003) Scaling personalized web search. In: WWW
Jolliffe I (2002). Principal component analysis. Springer, Heidelberg
Kamvar S, Haveliwala T, Manning C, Golub G (2003) Exploiting the block structure of the web for computing pagerank. Stanford University Technical Report
Karypis G and Kumar V (1999). Parallel multilevel k-way partitioning for irregular graphs. SIAM Rev 41(2): 278–300
Liben-Nowell D, Kleinberg J (2003) The link prediction problem for social networks. In: Proc. CIKM
Lu W, Janssen JCM, Milios EE, Japkowicz N and Zhang Y (2007). Node similarity in the citation graph. J Knowledge Informat Syst 11(1): 105–129
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: NIPS, pp 849–856
Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: Bringing order to the web. Technical Report, Stanford Digital Library Technologies Project. Paper SIDL-WP-1999-0120 (version of 11/11/1999)
Palopoli L, Rosaci D, Terracina G and Ursino D (2005). A graph-based approach for extracting terminological properties from information sources with heterogeneous formats. J Knowledge Informat Syst 8(4): 462–497
Pan J-Y, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: KDD, pp 653–658
Piegorsch W and Casella GE (1990). Inverting a sum of matrices. SIAM Rev 32: 470
Rasmusen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press
Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: ICDM, pp 418–425
Tong H, Faloutsos C (2006) Center-piece subgraphs: Problem definition and fast solutions. In: KDD
Zhou D, Bousquet O, Lal TN, Weston J, Scholkopf B (2003) Learning with local and global consistency. In: NIPS
Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using gaussian field and harmonic functions. In: ICML, pp 912–919
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tong, H., Faloutsos, C. & Pan, JY. Random walk with restart: fast solutions and applications. Knowl Inf Syst 14, 327–346 (2008). https://doi.org/10.1007/s10115-007-0094-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-007-0094-2