Abstract
Link-based similarity measures play significant role in many graph based applications. Consequently, measuring nodes similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR) and SimRank (SR) have emerged as the most popular and influential link-based similarity measures. In practice, PPR and SR scores are achieved by iterative computing. With increasing of iterations, the computations incur heavy overhead. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guarantee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing accurate and tight upper bounds of PPR and SR in the paper. Our upper bounds are designed based on following human intuition: “the smaller the difference between the two consecutive iteration step results is, the smaller the difference between iterative similarity scores and theoretical ones is”. Furthermore, we demonstrate effectiveness of our novel upper bounds in the scenario of top-k similar nodes query, where our upper bounds accelerate speed of the query. At last, we run a comprehensive set of experiments on real data sets to verify effectiveness and efficiency of our upper bounds
This work was supported by National Basic Research Program of China (973 Program)(No. 2012CB316205), NSFC under the grant No.61272137, 61033010, 61202114, 61165004 and NSSFC (No: 12&ZD220). It was partially done when the authors worked in SA Center for Big Data Research in RUC. This Center is funded by a Chinese National 111 Project Attracting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Antonellis, I., Molina, H.G., Chang, C.C.: Simrank++: query rewriting through link analysis of the click graph. In: Proc. VLDB Endow., vol. 1(1), pp. 408–421 (2008)
Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Mishima, T., Onizuka, M.: Efficient ad-hoc search for personalized pagerank. In: SIGMOD 2013, pp. 445–456 (2013)
Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., Zadeh, R.: Wtf: the who to follow service at twitter. In: WWW 2013, pp. 505–514 (2013)
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD, pp. 538–543 (2002)
Jeh, G., Widom, J.: Scaling personalized web search. In: WWW, pp. 271–279 (2003)
Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: An efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 7, 1–7 (2011)
Joshi, A., Kumar, R., Reed, B., Tomkins, A.: Anchor-based proximity measures. In: WWW 2007, pp. 1131–1132 (2007)
Lee, P., Lakshmanan, L.V.S., Yu, J.X.: On top-k structural similarity search. In: ICDE, pp. 774–785 (2012)
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)
Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. In: PVLDB, vol. 1(1), pp. 422–433 (2008)
Lizorkin, D., Velikhov, P., Grinev, M.N., Turdakov, D.: Accuracy estimate and optimization techniques for simrank computation. VLDB J. 19(1), 45–66 (2010)
Sarkar, P., Moore, A.W.: A tractable approach to finding closest truncated-commute-time neighbors in large graphs. In: UAI, pp. 335–343 (2007)
Sarkar, P., Moore, A.W., Prakash, A.: Fast incremental proximity search in large graphs. In: ICML, pp. 896–903 (2008)
Sun, L., Cheng, R., Li, X., Cheung, D.W., Han, J.: On link-based similarity join. In: PVLDB, vol. 4(11), pp. 714–725 (2011)
Yu, W., Lin, X., Zhang, W.: Towards efficient simrank computation on large networks. In: ICDE 2013, pp. 601–612 (2013)
Zhang, Y., Li, C., Chen, H., Sheng, L.: Fast simRank computation over disk-resident graphs. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013, Part II. LNCS, vol. 7826, pp. 16–30. Springer, Heidelberg (2013)
Zheng, W., Zou, L., Feng, Y., Chen, L., Zhao, D.: Efficient simrank-based similarity join over large graphs. In: PVLDB, vol. 6(7), pp. 493–504 (2013)
Zhu, F., Fang, Y., Chang, K.C.-C., Ying, J.: Incremental and accuracy-aware personalized pagerank through scheduled approximation. In: PVLDB, vol. 6(6), pp. 481–492 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, Y., Li, C., Xie, C., Chen, H. (2014). Accuracy Estimation of Link-Based Similarity Measures and Its Application. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-08010-9_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)