Abstract
The PageRank algorithm is used today within web information retrieval to provide a content-neutral ranking metric over web pages. It employs power method iterations to solve for the steady-state vector of a DTMC. The defining one-step probability transition matrix of this DTMC is derived from the hyperlink structure of the web and a model of web surfing behaviour which accounts for user bookmarks and memorised URLs.
In this paper we look to provide a more accessible, more broadly applicable explanation than has been given in the literature of how to make PageRank calculation more tractable through removal of the dangling-page matrix. This allows web pages without outgoing links to be removed before we employ power method iterations. It also allows decomposition of the problem according to irreducible subcomponents of the original transition matrix. Our explanation also covers a PageRank extension to accommodate TrustRank. In setting out our alternative explanation, we introduce and apply a general linear algebraic theorem which allows us to map homogeneous singular linear systems of index one to inhomogeneous non-singular linear systems with a shared solution vector. As an aside, we show in this paper that irreducibility is not required for PageRank to be well-defined.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web, Tech. rep. In: Stanford Digital Library Technologies Project (1998)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Seventh International World-Wide Web Conference, WWW 1998 (1998)
Official Google blog, http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
Eiron, N., McCurley, K.S., Tomlin, J.A.: Ranking the web frontier. In: WWW 2004: Proceedings of the 13th international conference on World Wide Web, pp. 309–318. ACM, New York (2004)
Avrachenkov, K., Litvak, N.: Decomposition of the Google PageRank and Optimal Linking Strategy, Tech. Rep. RR-5101, INRIA (01 2004)
Langville, A.N., Meyer, C.D.: A reordering for the PageRank problem. SIAM J. Sci. Comput. 27, 2112–2120 (2004)
Del Corso, G.M., Gullí, A., Romani, F.: Fast PageRank computation via a sparse linear system. Internet Mathematics 2(3)
Lee, C.P.-C., Golub, G.H., Zenios, S.A.: A fast two-stage algorithm for computing PageRank and its extensions, Technical report, Stanford InfoLab (2003)
Ipsen, I.C.F., Selee, T.M.: PageRank computation, with special attention to dangling nodes. SIAM J. Matrix Anal. Appl. 29(4), 1281–1296 (2007)
Lin, Y., Shi, X., Wei, Y.: On computing PageRank via lumping the Google matrix. J. Comput. Appl. Math. 224(2), 702–708 (2009)
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: VLDB 2004: Proceedings of the Thirtieth international conference on Very large data bases, pp. 576–587. VLDB Endowment (2004)
Kollias, G., Gallopoulos, E., Szyld, D.B.: Asynchronous iterative computations with web information retrieval structures: The pagerank case, CoRR abs/cs/0606047
Cevahir, A., Aykanat, C., Turk, A., Cambazoglu, B.B.: A web-site-based partitioning technique for reducing preprocessing overhead of parallel pagerank computation. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 908–918. Springer, Heidelberg (2007)
Bradley, J.T., de Jager, D., Knottenbelt, W.J., Trifunovic, A.: Hypergraph Partitioning for Faster Parallel PageRank Computation. In: Bravetti, M., Kloul, L., Zavattaro, G. (eds.) EPEW/WS-EM 2005. LNCS, vol. 3670, pp. 155–171. Springer, Heidelberg (2005)
Chazan, D., Miranker, W.L.: Chaotic relaxation. Linear Algebra and Its Applications 2, 199–222 (1969)
Kamvar, S.D., Haveliwala, T.H., Manning, C.D., Golub, G.H.: Extrapolation methods for accelerating PageRank computations. In: Proceedings of the 12th Int. World Wide Web Conference (2003)
Berkhin, P.: A survey on PageRank computing. Internet Mathematics 2, 73–120 (2005)
Langville, A.N., Meyer, C.D.: Deeper inside PageRank. Internet Mathematics 1(3), 335–380 (2004)
Haveliwala, T., Kamvar, S.: The second eigenvalue of the Google matrix, Technical Report 2003–20, Stanford InfoLab (2003)
Berman, A., Plemmons, R.J.: Nonnegative matrices in the mathematical sciences. Academic Press, New York (1979)
Thurow, S., Sullivan, D.: Search Engine Visibility. Pearson Education, London (2002)
Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Jager, D.V., Bradley, J.T. (2009). PageRank: Splitting Homogeneous Singular Linear Systems of Index One. In: Azzopardi, L., et al. Advances in Information Retrieval Theory. ICTIR 2009. Lecture Notes in Computer Science, vol 5766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04417-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-04417-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04416-8
Online ISBN: 978-3-642-04417-5
eBook Packages: Computer ScienceComputer Science (R0)