Second-order random walk-based proximity measures in graph analysis: formulations and algorithms

Wu, Yubao; Zhang, Xiang; Bian, Yuchen; Cai, Zhipeng; Lian, Xiang; Liao, Xueting; Zhao, Fengpan

doi:10.1007/s00778-017-0490-5

Second-order random walk-based proximity measures in graph analysis: formulations and algorithms

Regular Paper
Published: 01 December 2017

Volume 27, pages 127–152, (2018)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Yubao Wu ORCID: orcid.org/0000-0001-9356-8508¹,
Xiang Zhang²,
Yuchen Bian²,
Zhipeng Cai¹,
Xiang Lian³,
Xueting Liao¹ &
…
Fengpan Zhao¹

1596 Accesses
12 Citations
Explore all metrics

Abstract

Measuring the proximity between different nodes is a fundamental problem in graph analysis. Random walk-based proximity measures have been shown to be effective and widely used. Most existing random walk measures are based on the first-order Markov model, i.e., they assume that the next step of the random surfer only depends on the current node. However, this assumption neither holds in many real-life applications nor captures the clustering structure in the graph. To address the limitation of the existing first-order measures, in this paper, we study the second-order random walk measures, which take the previously visited node into consideration. While the existing first-order measures are built on node-to-node transition probabilities, in the second-order random walk, we need to consider the edge-to-edge transition probabilities. Using incidence matrices, we develop simple and elegant matrix representations for the second-order proximity measures. A desirable property of the developed measures is that they degenerate to their original first-order forms when the effect of the previous step is zero. We further develop Monte Carlo methods to efficiently compute the second-order measures and provide theoretical performance guarantees. Experimental results show that in a variety of applications, the second-order measures can dramatically improve the performance compared to their first-order counterparts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

QSIM: A novel approach to node proximity estimation based on Discrete-time quantum walk

Article 06 November 2020

Xin Wang, Kai Lu, … Kai Liu

An Introduction to Proximity Graphs

Navigation by anomalous random walks on complex networks

Article Open access 23 November 2016

Tongfeng Weng, Jie Zhang, … Pan Hui

Notes

The entire graph is publicly available at http://webdatacommons.org.
The data are publicly available at http://dblp.uni-trier.de/xml/.

References

www.robwu.net. Accessed 20 Nov 2017
Andersen, R., Chung, F., Lang, K.: Local graph partitioning using PageRank vectors. In: FOCS, pp. 475–486 (2006)
Benson, A.R., Gleich, D.F., Leskovec, J.: Tensor spectral clustering for partitioning higher-order network structures. In: SDM, pp. 118–126 (2015)
Benson, A.R., Gleich, D.F., Leskovec, J.: Higher-order organization of complex networks. Science 353(6295), 163–166 (2016)
Article Google Scholar
Bucklin, R.E., Sismeiro, C.: Click here for internet insight: advances in clickstream data analysis in marketing. J. Interact. Mark. 23(1), 35–48 (2009)
Article Google Scholar
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: SDM, pp. 442–446 (2004)
Chung, F., Lu, L.: Old and new concentration inequalities. In: Chung, F., Lu, L. (eds.) Complex Graphs and Networks Chap 12. AMS, Providence (2006)
Cohen, S., et al.: A survey on proximity measures for social networks. In: Search Computing, pp. 191–206 (2012)
Fang, Y., Chang, K.C.-C., Lauw, H.W.: Roundtriprank: graph-based proximity with importance and specificity? In: ICDE, pp. 613–624 (2013)
Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: WWW, pp. 641–650 (2005)
Fogaras, D., Rácz, B., Csalogány, K., et al.: Towards scaling fully personalized PageRank: algorithms, lower bounds, and experiments. Internet Math. 2(3), 333–358 (2005)
Article MathSciNet MATH Google Scholar
Fouss, F., Pirotte, A., Renders, J.-M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. TKDE 19(3), 355–369 (2007)
Google Scholar
Fujiwara, Y., Nakatsuji, M. et al.: Efficient search algorithm for SimRank. In: ICDE, pp. 589–600 (2013)
Gleich, D.F.: Pagerank beyond the web. SIAM Rev. 57(3), 321–363 (2015)
Article MathSciNet MATH Google Scholar
Gleich, D.F., Lim, L.-H., Yu, Y.: Multilinear PageRank. SIAM J. Matrix Anal. Appl. 36(4), 1507–1541 (2015)
Article MathSciNet MATH Google Scholar
He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: SIGKDD, pp. 543–552 (2010)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. JASA 58(301), 13–30 (1963)
Article MathSciNet MATH Google Scholar
Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: KDD, pp. 538–543 (2002)
Jeh, G., Widom, J.: Scaling personalized web search. In: WWW, pp. 271–279 (2003)
Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)
Article MATH Google Scholar
Kusumoto, M., Maehara, T., Kawarabayashi, K.: Scalable similarity search for SimRank. In: SIGMOD, pp. 325–336 (2014)
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)
Article Google Scholar
Langville, A.N., Meyer, C.D.: Deeper inside PageRank. Internet Math. 1(3), 335–380 (2004)
Article MathSciNet MATH Google Scholar
Langville, A.N., Meyer, C.D.: The mathematics guide. In: Langville, A.N., Meyer, C.D. (eds.) Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2011)
LeCun, Y., Boser, B.E., Denker, J.S., et al.: Handwritten digit recognition with a back-propagation network. In: NIPS (1990)
Lehmberg, O., et al.: Graph structure in the web: aggregated by pay-level domain. In: WebSci, pp. 119–128 (2014)
Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of SimRank for static and dynamic information networks. In: EDBT, pp. 465–476 (2010)
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. JASIST 58(7), 1019–1031 (2007)
Article Google Scholar
Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: LinkSCAN*: overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303 (2014)
Lü, L., Zhou, T.: Link prediction in complex networks: a survey. Physica A 390(6), 1150–1170 (2011)
Article Google Scholar
Maehara, T., Kusumoto, M., et al.: Efficient SimRank computation via linearization. arXiv:1411.7228 (2014)
Mei, Q., Zhou, D., Church, K.: Query suggestion using hitting time. In: CIKM, pp. 469–478 (2008)
Meyer, C.D.: Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia (2000)
Book Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Stanford InfoLab, Stanford (1999)
Google Scholar
Raftery, A.E.: A model for high-order Markov chains. J. R. Stat. Soc. Ser. B 47(3), 528–539 (1985)
MathSciNet MATH Google Scholar
Rosvall, M., Esquivel, A.V., Lancichinetti, A., et al.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 4630 (2014)
Article Google Scholar
Rothe, S., Schütze, H.: CoSimRank: a flexible & efficient graph-theoretic similarity measure. In: ACL, pp. 1392–1402 (2014)
Sarkar, P., Moore, A.: A tractable approach to finding closest truncated-commute-time neighbors in large graphs. In: UAI (2012)
Sarkar, P., Moore, A.W.: Fast nearest-neighbor search in disk-resident graphs. In: KDD, pp. 513–522 (2010)
Tong, H., Faloutsos, C., Pan, J.-Y.: Fast random walk with restart and its applications. In: ICDM, pp. 613–622 (2006)
Wu, Y., Bian, Y., Zhang, X.: Remember where you came from: on the second-order random walk based proximity measures. PVLDB 10(1), 13–24 (2017)
Google Scholar
Wu, Y., Jin, R., Li, J., Zhang, X.: Robust local community detection: on free rider effect and its elimination. PVLDB 8(7), 798–809 (2015)
Google Scholar
Wu, Y., Jin, R., Zhang, X.: Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In SIGMOD, pp. 1139–1150 (2014)
Wu, Y., Jin, R., Zhang, X.: Efficient and exact local search for random walk based top-\(k\) proximity query in large graphs. TKDE 28(5), 1160–1174 (2016)
Google Scholar
Yu, W., Lin, X., Le, J.: Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs. In WAIM, pp. 280–296 (2010)
Yu, W., Lin, X., Zhang, W., Chang, L., Pei, J.: More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB 7(1), 13–24 (2013)
Google Scholar
Zhang, C., Shou, L., Chen, K., Chen, G., Bei, Y.: Evaluating geo-social influence in location-based social networks. In CKIM, pp. 1442–1451 (2012)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: ICML, pp. 912–919 (2003)
Zhu, X., Goldberg, A.: Graph-based semi-supervised learning. In: Zhu, X., Goldberg, A. (eds.) Introduction to Semi-supervised Learning. Morgan & Claypool Publishers, San Rafel (2009)

Download references

Acknowledgements

This work was partially supported by the National Science Foundation Grants IIS-11623-74, CAREER, and the NIH Grant R01GM115833.

Author information

Authors and Affiliations

Department of Computer Science, Georgia State University, Atlanta, GA, USA
Yubao Wu, Zhipeng Cai, Xueting Liao & Fengpan Zhao
College Information Sciences and Technology, The Pennsylvania State University, State College, PA, USA
Xiang Zhang & Yuchen Bian
Department of Computer Science, Kent State University, Kent, OH, USA
Xiang Lian

Authors

Yubao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Bian
View author publications
You can also search for this author in PubMed Google Scholar
Zhipeng Cai
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Lian
View author publications
You can also search for this author in PubMed Google Scholar
Xueting Liao
View author publications
You can also search for this author in PubMed Google Scholar
Fengpan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yubao Wu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 9422 KB)

Supplementary material 2 (pdf 1316 KB)

Supplementary material 3 (pdf 399 KB)

Supplementary material 4 (pdf 1056 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Zhang, X., Bian, Y. et al. Second-order random walk-based proximity measures in graph analysis: formulations and algorithms. The VLDB Journal 27, 127–152 (2018). https://doi.org/10.1007/s00778-017-0490-5

Download citation

Received: 08 May 2017
Revised: 19 September 2017
Accepted: 31 October 2017
Published: 01 December 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s00778-017-0490-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Second-order random walk-based proximity measures in graph analysis: formulations and algorithms

Abstract

Access this article

Similar content being viewed by others

QSIM: A novel approach to node proximity estimation based on Discrete-time quantum walk

An Introduction to Proximity Graphs

Navigation by anomalous random walks on complex networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 9422 KB)

Supplementary material 2 (pdf 1316 KB)

Supplementary material 3 (pdf 399 KB)

Supplementary material 4 (pdf 1056 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation