Abstract
Measuring the proximity between different nodes is a fundamental problem in graph analysis. Random walk-based proximity measures have been shown to be effective and widely used. Most existing random walk measures are based on the first-order Markov model, i.e., they assume that the next step of the random surfer only depends on the current node. However, this assumption neither holds in many real-life applications nor captures the clustering structure in the graph. To address the limitation of the existing first-order measures, in this paper, we study the second-order random walk measures, which take the previously visited node into consideration. While the existing first-order measures are built on node-to-node transition probabilities, in the second-order random walk, we need to consider the edge-to-edge transition probabilities. Using incidence matrices, we develop simple and elegant matrix representations for the second-order proximity measures. A desirable property of the developed measures is that they degenerate to their original first-order forms when the effect of the previous step is zero. We further develop Monte Carlo methods to efficiently compute the second-order measures and provide theoretical performance guarantees. Experimental results show that in a variety of applications, the second-order measures can dramatically improve the performance compared to their first-order counterparts.
Similar content being viewed by others
Notes
The entire graph is publicly available at http://webdatacommons.org.
The data are publicly available at http://dblp.uni-trier.de/xml/.
References
www.robwu.net. Accessed 20 Nov 2017
Andersen, R., Chung, F., Lang, K.: Local graph partitioning using PageRank vectors. In: FOCS, pp. 475–486 (2006)
Benson, A.R., Gleich, D.F., Leskovec, J.: Tensor spectral clustering for partitioning higher-order network structures. In: SDM, pp. 118–126 (2015)
Benson, A.R., Gleich, D.F., Leskovec, J.: Higher-order organization of complex networks. Science 353(6295), 163–166 (2016)
Bucklin, R.E., Sismeiro, C.: Click here for internet insight: advances in clickstream data analysis in marketing. J. Interact. Mark. 23(1), 35–48 (2009)
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: SDM, pp. 442–446 (2004)
Chung, F., Lu, L.: Old and new concentration inequalities. In: Chung, F., Lu, L. (eds.) Complex Graphs and Networks Chap 12. AMS, Providence (2006)
Cohen, S., et al.: A survey on proximity measures for social networks. In: Search Computing, pp. 191–206 (2012)
Fang, Y., Chang, K.C.-C., Lauw, H.W.: Roundtriprank: graph-based proximity with importance and specificity? In: ICDE, pp. 613–624 (2013)
Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: WWW, pp. 641–650 (2005)
Fogaras, D., Rácz, B., Csalogány, K., et al.: Towards scaling fully personalized PageRank: algorithms, lower bounds, and experiments. Internet Math. 2(3), 333–358 (2005)
Fouss, F., Pirotte, A., Renders, J.-M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. TKDE 19(3), 355–369 (2007)
Fujiwara, Y., Nakatsuji, M. et al.: Efficient search algorithm for SimRank. In: ICDE, pp. 589–600 (2013)
Gleich, D.F.: Pagerank beyond the web. SIAM Rev. 57(3), 321–363 (2015)
Gleich, D.F., Lim, L.-H., Yu, Y.: Multilinear PageRank. SIAM J. Matrix Anal. Appl. 36(4), 1507–1541 (2015)
He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: SIGKDD, pp. 543–552 (2010)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. JASA 58(301), 13–30 (1963)
Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: KDD, pp. 538–543 (2002)
Jeh, G., Widom, J.: Scaling personalized web search. In: WWW, pp. 271–279 (2003)
Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)
Kusumoto, M., Maehara, T., Kawarabayashi, K.: Scalable similarity search for SimRank. In: SIGMOD, pp. 325–336 (2014)
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)
Langville, A.N., Meyer, C.D.: Deeper inside PageRank. Internet Math. 1(3), 335–380 (2004)
Langville, A.N., Meyer, C.D.: The mathematics guide. In: Langville, A.N., Meyer, C.D. (eds.) Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2011)
LeCun, Y., Boser, B.E., Denker, J.S., et al.: Handwritten digit recognition with a back-propagation network. In: NIPS (1990)
Lehmberg, O., et al.: Graph structure in the web: aggregated by pay-level domain. In: WebSci, pp. 119–128 (2014)
Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of SimRank for static and dynamic information networks. In: EDBT, pp. 465–476 (2010)
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. JASIST 58(7), 1019–1031 (2007)
Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: LinkSCAN*: overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303 (2014)
Lü, L., Zhou, T.: Link prediction in complex networks: a survey. Physica A 390(6), 1150–1170 (2011)
Maehara, T., Kusumoto, M., et al.: Efficient SimRank computation via linearization. arXiv:1411.7228 (2014)
Mei, Q., Zhou, D., Church, K.: Query suggestion using hitting time. In: CIKM, pp. 469–478 (2008)
Meyer, C.D.: Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia (2000)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Stanford InfoLab, Stanford (1999)
Raftery, A.E.: A model for high-order Markov chains. J. R. Stat. Soc. Ser. B 47(3), 528–539 (1985)
Rosvall, M., Esquivel, A.V., Lancichinetti, A., et al.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 4630 (2014)
Rothe, S., Schütze, H.: CoSimRank: a flexible & efficient graph-theoretic similarity measure. In: ACL, pp. 1392–1402 (2014)
Sarkar, P., Moore, A.: A tractable approach to finding closest truncated-commute-time neighbors in large graphs. In: UAI (2012)
Sarkar, P., Moore, A.W.: Fast nearest-neighbor search in disk-resident graphs. In: KDD, pp. 513–522 (2010)
Tong, H., Faloutsos, C., Pan, J.-Y.: Fast random walk with restart and its applications. In: ICDM, pp. 613–622 (2006)
Wu, Y., Bian, Y., Zhang, X.: Remember where you came from: on the second-order random walk based proximity measures. PVLDB 10(1), 13–24 (2017)
Wu, Y., Jin, R., Li, J., Zhang, X.: Robust local community detection: on free rider effect and its elimination. PVLDB 8(7), 798–809 (2015)
Wu, Y., Jin, R., Zhang, X.: Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In SIGMOD, pp. 1139–1150 (2014)
Wu, Y., Jin, R., Zhang, X.: Efficient and exact local search for random walk based top-\(k\) proximity query in large graphs. TKDE 28(5), 1160–1174 (2016)
Yu, W., Lin, X., Le, J.: Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs. In WAIM, pp. 280–296 (2010)
Yu, W., Lin, X., Zhang, W., Chang, L., Pei, J.: More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB 7(1), 13–24 (2013)
Zhang, C., Shou, L., Chen, K., Chen, G., Bei, Y.: Evaluating geo-social influence in location-based social networks. In CKIM, pp. 1442–1451 (2012)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: ICML, pp. 912–919 (2003)
Zhu, X., Goldberg, A.: Graph-based semi-supervised learning. In: Zhu, X., Goldberg, A. (eds.) Introduction to Semi-supervised Learning. Morgan & Claypool Publishers, San Rafel (2009)
Acknowledgements
This work was partially supported by the National Science Foundation Grants IIS-11623-74, CAREER, and the NIH Grant R01GM115833.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wu, Y., Zhang, X., Bian, Y. et al. Second-order random walk-based proximity measures in graph analysis: formulations and algorithms. The VLDB Journal 27, 127–152 (2018). https://doi.org/10.1007/s00778-017-0490-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-017-0490-5