Skip to main content
Log in

Random walk with restart: fast solutions and applications

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the “connection subgraphs”, personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic pre-computation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) block-wise, community-like structure. We exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman–Morrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in pre-computation and storage cost, and they achieve up to 150 × speed up with 90%+ quality preservation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Achlioptas D, McSherry F (2001) Fast computation of low rank matrix approximation. In: STOC

  2. Aditya B, Bhalotia G, Chakrabarti S, Hulgeri A, Nakhe C, Parag SS (2002) Banks: Browsing and keyword searching in relational databases. In: VLDB, pp 1083–1086

  3. Balmin A, Hristidis V, Papakonstantinou Y (2004) Objectrank: Authority-based keyword search in databases. In: VLDB, 564, 564–575

  4. http://www.informatik.uni-trier.de/~ley/db/

  5. Deerwester S, Dumais S, Landauer T, Furnas G and Harshman R (1990). Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6): 391–407

    Article  Google Scholar 

  6. Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: The ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 03), Washington, DC, August 24–27

  7. Faloutsos C, McCurley KS, Tomkins A (2004) Fast discovery of connection subgraphs. In: KDD, pp 118–127

  8. Flake G, Lawrence S, Giles C (2000) Efficient identification of web communities. In: KDD, pp 150–160

  9. Fogaras D, Racz B (2004) Towards scaling fully personalized pagerank. In: Proc. WAW, pp 105–117

  10. Geerts F, Mannila H, Terzi E (2004) Relational link-based ranking. In: VLDB, pp 552–563

  11. Girvan M, Newman MEJ (2002) Community structure is social and biological networks. Proc Natl Acad Sci 7821–7826

  12. Golub G, Loan C (1996) Matrix computation. Johns Hopkins

  13. Haveliwala TH (2002) Topic-sensitive pagerank. WWW, pp 517–526

  14. He J, Li M, Zhang H, Tong H, Zhang C (2004) Manifold-ranking based image retrieval. In: ACM Multimedia, pp 9–16

  15. Jeh G, Widom J (2002) Simrank: A measure of structural-context similarity. In: KDD, pp 538–543

  16. Jeh G, Widom J (2003) Scaling personalized web search. In: WWW

  17. Jolliffe I (2002). Principal component analysis. Springer, Heidelberg

    MATH  Google Scholar 

  18. Kamvar S, Haveliwala T, Manning C, Golub G (2003) Exploiting the block structure of the web for computing pagerank. Stanford University Technical Report

  19. Karypis G and Kumar V (1999). Parallel multilevel k-way partitioning for irregular graphs. SIAM Rev 41(2): 278–300

    Article  MATH  MathSciNet  Google Scholar 

  20. Liben-Nowell D, Kleinberg J (2003) The link prediction problem for social networks. In: Proc. CIKM

  21. Lu W, Janssen JCM, Milios EE, Japkowicz N and Zhang Y (2007). Node similarity in the citation graph. J Knowledge Informat Syst 11(1): 105–129

    Article  Google Scholar 

  22. Ng A, Jordan M, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: NIPS, pp 849–856

  23. Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: Bringing order to the web. Technical Report, Stanford Digital Library Technologies Project. Paper SIDL-WP-1999-0120 (version of 11/11/1999)

  24. Palopoli L, Rosaci D, Terracina G and Ursino D (2005). A graph-based approach for extracting terminological properties from information sources with heterogeneous formats. J Knowledge Informat Syst 8(4): 462–497

    Article  Google Scholar 

  25. Pan J-Y, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: KDD, pp 653–658

  26. Piegorsch W and Casella GE (1990). Inverting a sum of matrices. SIAM Rev 32: 470

    Article  MathSciNet  Google Scholar 

  27. Rasmusen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press

  28. Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: ICDM, pp 418–425

  29. Tong H, Faloutsos C (2006) Center-piece subgraphs: Problem definition and fast solutions. In: KDD

  30. Zhou D, Bousquet O, Lal TN, Weston J, Scholkopf B (2003) Learning with local and global consistency. In: NIPS

  31. Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using gaussian field and harmonic functions. In: ICML, pp 912–919

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanghang Tong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tong, H., Faloutsos, C. & Pan, JY. Random walk with restart: fast solutions and applications. Knowl Inf Syst 14, 327–346 (2008). https://doi.org/10.1007/s10115-007-0094-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0094-2

Keywords

Navigation