Abstract
Personalized PageRank expresses backlink-based page quality around user-selected pages in a similar way as PageRank expresses quality over the entire Web. Existing personalized PageRank algorithms can however serve on-line queries only for a restricted choice of page selection. In this paper we achieve full personalization by a novel algorithm that computes a compact database of simulated random walks; this database can serve arbitrary personal choices of small subsets of web pages. We prove that for a fixed error probability, the size of our database is linear in the number of web pages. We justify our estimation approach by asymptotic worst-case lower bounds; we show that exact personalized PageRank values can only be obtained from a database of quadratic size.
Research was supported by grants OTKA T 42559 and T 42706 of the Hungarian National Science Fund, and NKFP-2/0017/2002 project Data Riddle.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bar-Yossef, Z., Berg, A., Chien, S., Fakcharoenphol, J., Weitz, D.: Approximating aggregate queries about web pages via random walks. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 535–544. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Bar-Yossef, Z., Broder, A.Z., Kumar, R., Tomkins, A.: Sic transit gloria telae: towards an understanding of the web’s decay. In: Proceedings of the 13th World Wide Web Conference (WWW), pp. 328–337. ACM Press, New York (2004)
Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of the 13th World Wide Web Conference (WWW), pp. 595–602. ACM Press, New York (2004)
Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Finding authorities and hubs from link structures on the world wide web. In: 10th International World Wide Web Conference, pp. 415–429 (2001)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(l-7), 107–117 (1998)
Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences (SEQUENCES 1997), pp. 21–29. IEEE Computer Society, Los Alamitos (1997)
Chen, Y.-Y., Gan, Q., Suel, T.: I/O-efHcient techniques for computing Page-Rank. In: Proceedings of the eleventh international conference on Information and knowledge management, pp. 549–557. ACM Press, New York (2002)
Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)
Eiron, N., McCurley, K.S.: Locality, hierarchy, and bidirectionality in the web. In: Second Workshop on Algorithms and Models for the Web-Graph (WAW 2003) (2003)
Fogaras, D.: Where to start browsing the web? In: Böhme, T., Heyer, G., Unger, H. (eds.) IICS 2003. LNCS, vol. 2877, pp. 65–79. Springer, Heidelberg (2003)
Fogaras, D., Rácz, B.: A scalable randomized method to compute link-based similarity rank on the web graph. In: Proceedings of the Clustering Information over the Web workshop. Conference on Extending Database Technology (2004), http://www.ilab.sztaki.hu/websearch/Publications/index.html
Google, P.: http://labs.google.com/personalized
Haveliwala, T.H.: Topic-sensitive PageRank. In: Proceedings of the 11th World Wide Web Conference (WWW), Honolulu, Hawaii (2002)
Haveliwala, T.H., Kamvar, S., Jeh, G.: An analytical comparison of approaches to personalizing PageRank. Technical report, Stanford University (2003)
Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: Measuring index quality using random walks on the Web. In: Proceedings of the 8th World Wide Web Conference, Toronto, Canada, pp. 213–225 (1999)
Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform url sampling. In: Proceedings of the 9th international World Wide Web conference on Computer networks, pp. 295–308 (2000)
Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. In: External memory algorithms, pp. 107–118 (1999)
Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th World Wide Web Conference (WWW), pp. 271–279. ACM Press, New York (2003)
Kamvar, S., Haveliwala, T.H., Manning, C., Golub, G.: Exploiting the block structure of the web for computing PageRank. Technical report, Stanford University (2003)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Kushilevitz, E., Nisan, N.: Communication complexity. Cambridge University Press, Cambridge (1997)
Lempel, R., Moran, S.: Rank stability and rank similarity of link-based web ranking algorithms in authority connected graphs. In: Second Workshop on Algorithms and Models for the Web-Graph (WAW 2003) (2003)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Palmer, C.R., Gibbons, P.B., Faloutsos, C.: ANF: a fast and scalable tool for data mining in massive graphs. In: Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 81–90. ACM Press, New York (2002)
Richardson, M., Domingos, P.: The Intelligent Surfer: Probabilistic combination of link and content information in PageRank. Advances in Neural Information Processing Systems 14, 1441–1448 (2002)
Rusmevichientong, P., Pennock, D.M., Lawrence, S., Giles, C.L.: Methods for sampling pages uniformly from the world wide web. In: AAAI Fall Symposium on Using Uncertainty Within Computation, pp. 121–128 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fogaras, D., Rácz, B. (2004). Towards Scaling Fully Personalized PageRank. In: Leonardi, S. (eds) Algorithms and Models for the Web-Graph. WAW 2004. Lecture Notes in Computer Science, vol 3243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30216-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-30216-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23427-2
Online ISBN: 978-3-540-30216-2
eBook Packages: Springer Book Archive