Abstract
A power method formulation, which efficiently handles the problem of dangling pages, is investigated for parallelization of PageRank computation. Hypergraph-partitioning-based sparse matrix partitioning methods can be successfully used for efficient parallelization. However, the preprocessing overhead due to hypergraph partitioning, which must be repeated often due to the evolving nature of the Web, is quite significant compared to the duration of the PageRank computation. To alleviate this problem, we utilize the information that sites form a natural clustering on pages to propose a site-based hypergraph-partitioning technique, which does not degrade the quality of the parallelization. We also propose an efficient parallelization scheme for matrix-vector multiplies in order to avoid possible communication due to the pages without in-links. Experimental results on realistic datasets validate the effectiveness of the proposed models.
This work is partially supported by The Scientific and Technological Research Council of Turkey (TÃœBÄ°TAK) under project EEEAG-106E069.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aykanat, C., Pinar, A., Catalyurek, U.V.: Permuting sparse rectangular matrices into block-diagonal form. SIAM J. Scientific Computing 25(6), 1860–1879 (2004)
Aykanat, C., Cambazoglu, B.B., Ucar, B.: Multilevel hypergraph partitioning with multiple constraints and fixed vertices. J. Parallel and Distributed Computing. (submitted)
Berkhin, P.: A survey on PageRank computing. Internet Mathematics 2(1), 73–120 (2005)
Bradley, J.T., Jager, D.V., Knottenbelt, W.J., Trifunovic, A.: Hypergraph partitioning for faster parallel PageRank computation. In: Bravetti, M., Kloul, L., Zavattaro, G. (eds.) Formal Techniques for Computer Systems and Business Processes. LNCS, vol. 3670, pp. 155–171. Springer, Heidelberg (2005)
Brezinski, C., Redivo-Zaglia, M., Serra Capizzano, S.: Extrapolation methods for PageRank computations. Comptes Rendus de l’Académie des Sciences de Paris, Series I 340, 393–397 (2005)
Catalyurek, U.V., Aykanat, C.: Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In: Saad, Y., Yang, T., Ferreira, A., Rolim, J.D.P. (eds.) IRREGULAR 1996. LNCS, vol. 1117, pp. 75–86. Springer, Heidelberg (1996)
Catalyurek, U.V., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems 10(7), 673–693 (1999)
Catalyurek, U.V., Aykanat, C.: A multilevel hypergraph partitioning tool, version 3.0. Tech. Rep., Bilkent University (1999)
Gleich, D., Zhukov, L., Berkhin, P.: Fast parallel PageRank: A linear system approach. Tech. Rep. YRL-2004-038, Yahoo! (2004)
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating Web spam with TrustRank. In: Proc. 30th Int’l Conf. on VLDB, pp. 576–587 (2004)
Haveliwala, T.: Topic sensitive PageRank. In: Proc. 11th Int’l WWW Conf., pp. 517–526 (2002)
Ipsen, I.C.F., Kirkland, S.: Convergence analysis of a PageRank updating algorithm by Langville and Meyer. SIAM J. Matrix Anal. Appl. 27, 952–967 (2006)
Ipsen, I.C.F., Selee, T.M.: PageRank computation, with special attention to dangling nodes. SIAM J. Matrix Anal. Appl. (submitted, 2007)
Ipsen, I.C.F., Wills, R.S.: Mathematical properties and analysis of Google’s PageRank. Bol. Soc. Exp. May. Apl. 34, 191–196 (2006)
Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Extrapolation methods for accelerating PageRank computations. In: Proc. 12th Int’l WWW Conf., pp. 261–270 (2003)
Kamvar, S., Haveliwala, T., Golub, G.: Adaptive methods for computation of PageRank. In: Proc. Int’l Conf. on the Numerical Solution of Markov Chains (2003)
Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Exploiting the block structure of the Web for computing PageRank. Tech. Rep., Stanford Univ. (2003)
Langville, A.N., Meyer, C.D.: Deeper inside PageRank. Internet Mathematics 1(3), 335–380 (2005)
Langville, A.N., Meyer, C.D.: A reordering for the PageRank problem. SIAM J. Scientific Computing 27(6), 2112–2120 (2006)
Manaskasemsak, B., Rungsawang, A.: Parallel PageRank computation on a gigabit PC cluster. In: Proc. AINA 2004, pp. 273–277 (2004)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the Web. Tech. Rep. 1999-66, Stanford Univ. (1999)
Ucar, B., Aykanat, C.: A library for parallel sparse matrix-vector multiplies. Tech. Rep. BU-CE-0506, Department of Computer Engineering, Bilkent University, Ankara, Turkey (2005)
Ucar, B., Aykanat, C.: Encapsulating multiple communication-cost metrics in partitioning sparse rectangular matrices for matrix-vector multiplies. SIAM J. Scientific Computing. 25(6), 1837–1859 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cevahir, A., Aykanat, C., Turk, A., Cambazoglu, B.B. (2007). A Web-Site-Based Partitioning Technique for Reducing Preprocessing Overhead of Parallel PageRank Computation. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_108
Download citation
DOI: https://doi.org/10.1007/978-3-540-75755-9_108
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75754-2
Online ISBN: 978-3-540-75755-9
eBook Packages: Computer ScienceComputer Science (R0)