Abstract
Link based analysis of web graphs has been extensively explored in many research projects. PageRank computation is one widely known approach which forms the basis of the Google search. PageRank assigns a global importance score to a web page based on the importance of other web pages pointing to it. PageRank is an iterative algorithm applying on a massively connected graph corresponding to several hundred millions of nodes and hyper-links. In this paper, we propose an efficient implementation of PageRank computation for a large sub-graph of the web on a PC cluster. A link structure file representing the web graph of several hundred million links, and an efficient PageRank algorithm capable of computing PageRank scores very fast, will be discussed. Experimental results on a small cluster of x86 based PC with artificial 776 million links of 87 million nodes derived from the TH domain report 30.77 seconds per iteration run.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arasu, A., Novak, J., Tomkins, A., Tomlin, J.: Pagerank computation and the structure of the web: Experiments and algorithms. In: Proc. of the 11th WWW Conf. (2002) (poster track )
Bharat, K., Chang, B.W., Henzinger, M.: Who links to whom: Mining linkage between web sites. In: Proc. of the IEEE Conf. on Data Mining (November 2001)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks 30(1-7), 107–117 (1998)
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R.: Graph structure in the web. In: Proc. of the 9th WWW Conf. (2000)
Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: A new approach to topic-specific web resource discovery. In: Proc. of the 8th WWW Conf. (1999)
Chen, Y., Gan, Q., Suel, T.: I/O-efficient techniques for computing pagerank. In: Proc. of the 11th ACM CIKM Conf. (2002)
Chien, S., Dwork, C., Kumar, R., Sivakumar, D.: Towards exploting link evolution. In: Workshop on Algorithms and Models for the Web Graph (2001)
Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through url ordering. In: Proc. of the 7th WWW Conf. (1998)
Golub, G., Loan, C.: Matrix Computations. Johns Hopkins U. Press, Baltimore (1996)
Haveliwala, T.H.: Efficient encodings for document ranking vectors. Technical report, Stanford University (November 2002)
Haveliwala, T.H.: Topic-sensitive pagerank. In: Proc. of the 11th WWW Conf. (2002)
Kamvar, S.D., Haveliwala, T.H., Manning, C.D., Golub, G.H.: Exploiting the block structure of the web for computing pagerank (March 2003) (preprint)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proc. of the ACM-SIAM Symposium on Discrete Algorithms (1998)
Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The web as a graph: Measurements, models and methods. In: Proc. of the Inter. Conf. on Combinatorics and Computing (1999)
Krieger, U.: Numerical solution of the large finite markov chains by algebraic multigrid techniques. In: Proc. of the 2nd Workshop on the Numerical Solution of Markov Chains (1995)
Najork, M., Wiener, J.L.: Breadth-first search crawling yields high-quality pages. In: Proc. of the 10th WWW Conf. (2001)
Uthayopas, P., Phatanapherom, S., Angskun, T., Sriprayoonsakul, S.: SCE: A fully integrated software tool for beowulf cluster system. In: Proc. of the Linux Cluster: the HPC Revol. (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rungsawang, A., Manaskasemsak, B. (2003). PageRank Computation Using PC Cluster. In: Dongarra, J., Laforenza, D., Orlando, S. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2003. Lecture Notes in Computer Science, vol 2840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39924-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-39924-7_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20149-6
Online ISBN: 978-3-540-39924-7
eBook Packages: Springer Book Archive