Accuracy Estimation of Link-Based Similarity Measures and Its Application

Zhang, Yinglong; Li, Cuiping; Xie, Chengwang; Chen, Hong

doi:10.1007/978-3-319-08010-9_13

Yinglong Zhang^20,21,
Cuiping Li²⁰,
Chengwang Xie²¹ &
…
Hong Chen²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8485))

Included in the following conference series:

International Conference on Web-Age Information Management

5821 Accesses
1 Citations

Abstract

Link-based similarity measures play significant role in many graph based applications. Consequently, measuring nodes similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR) and SimRank (SR) have emerged as the most popular and influential link-based similarity measures. In practice, PPR and SR scores are achieved by iterative computing. With increasing of iterations, the computations incur heavy overhead. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guarantee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing accurate and tight upper bounds of PPR and SR in the paper. Our upper bounds are designed based on following human intuition: “the smaller the difference between the two consecutive iteration step results is, the smaller the difference between iterative similarity scores and theoretical ones is”. Furthermore, we demonstrate effectiveness of our novel upper bounds in the scenario of top-k similar nodes query, where our upper bounds accelerate speed of the query. At last, we run a comprehensive set of experiments on real data sets to verify effectiveness and efficiency of our upper bounds

This work was supported by National Basic Research Program of China (973 Program)(No. 2012CB316205), NSFC under the grant No.61272137, 61033010, 61202114, 61165004 and NSSFC (No: 12&ZD220). It was partially done when the authors worked in SA Center for Big Data Research in RUC. This Center is funded by a Chinese National 111 Project Attracting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Antonellis, I., Molina, H.G., Chang, C.C.: Simrank++: query rewriting through link analysis of the click graph. In: Proc. VLDB Endow., vol. 1(1), pp. 408–421 (2008)
Google Scholar
Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Mishima, T., Onizuka, M.: Efficient ad-hoc search for personalized pagerank. In: SIGMOD 2013, pp. 445–456 (2013)
Google Scholar
Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., Zadeh, R.: Wtf: the who to follow service at twitter. In: WWW 2013, pp. 505–514 (2013)
Google Scholar
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD, pp. 538–543 (2002)
Google Scholar
Jeh, G., Widom, J.: Scaling personalized web search. In: WWW, pp. 271–279 (2003)
Google Scholar
Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: An efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 7, 1–7 (2011)
Article Google Scholar
Joshi, A., Kumar, R., Reed, B., Tomkins, A.: Anchor-based proximity measures. In: WWW 2007, pp. 1131–1132 (2007)
Google Scholar
Lee, P., Lakshmanan, L.V.S., Yu, J.X.: On top-k structural similarity search. In: ICDE, pp. 774–785 (2012)
Google Scholar
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)
Article Google Scholar
Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. In: PVLDB, vol. 1(1), pp. 422–433 (2008)
Google Scholar
Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)
Article Google Scholar
Sarkar, P., Moore, A.W.: A tractable approach to finding closest truncated-commute-time neighbors in large graphs. In: UAI, pp. 335–343 (2007)
Google Scholar
Sarkar, P., Moore, A.W., Prakash, A.: Fast incremental proximity search in large graphs. In: ICML, pp. 896–903 (2008)
Google Scholar
Sun, L., Cheng, R., Li, X., Cheung, D.W., Han, J.: On link-based similarity join. In: PVLDB, vol. 4(11), pp. 714–725 (2011)
Google Scholar
Yu, W., Lin, X., Zhang, W.: Towards efficient simrank computation on large networks. In: ICDE 2013, pp. 601–612 (2013)
Google Scholar
Zhang, Y., Li, C., Chen, H., Sheng, L.: Fast simRank computation over disk-resident graphs. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013, Part II. LNCS, vol. 7826, pp. 16–30. Springer, Heidelberg (2013)
Chapter Google Scholar
Zheng, W., Zou, L., Feng, Y., Chen, L., Zhao, D.: Efficient simrank-based similarity join over large graphs. In: PVLDB, vol. 6(7), pp. 493–504 (2013)
Google Scholar
Zhu, F., Fang, Y., Chang, K.C.-C., Ying, J.: Incremental and accuracy-aware personalized pagerank through scheduled approximation. In: PVLDB, vol. 6(6), pp. 481–492 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Lab of Data Engineering and Knowledge Engineering of MOE, and Department of Computer Science, Renmin University of China, China
Yinglong Zhang, Cuiping Li & Hong Chen
School of Software, East China Jiaotong University, China
Yinglong Zhang & Chengwang Xie

Authors

Yinglong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Cuiping Li
View author publications
You can also search for this author in PubMed Google Scholar
Chengwang Xie
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Utah, 50 S. Central Campus Drive, 84112, Salt Lake City,, UT, USA
Feifei Li
Department of Computer Science, Tsinghua University, 100084, Beijing, China
Guoliang Li
POSTECH, Republic of Korea
Seung-won Hwang
Shanghai Key Laboratory of Scalable Computing and Systems, Department of Computer Science and Engineering,, Shanghai Jiao Tong University, China
Bin Yao
Advanced Digital Sciences Center (ADSC), 138632, Singapore, Singapore
Zhenjie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Li, C., Xie, C., Chen, H. (2014). Accuracy Estimation of Link-Based Similarity Measures and Its Application. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-08010-9_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics