Abstract
The PageRank algorithm, used in the Google Search Engine, plays an important role in improving the quality of results by employing an explicit hyperlink structure among the Web pages. The prestige of Web pages defined by PageRank is derived solely from surfers’ random walk on the Web Graph without any textual content consideration. However, in the practical sense, user surfing behavior is far from random jumping. In this paper, we propose a link analysis that takes the textual information of Web pages into account. The result shows that our proposed ranking algorithms perform better than the original PageRank.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph Structure in the Web: Experiments and Models. In: Proceedings of the 9th International World Wide Web Conference on Computer Networks, Amesterdam, pp. 309–320 (2000)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1–7), 107–117 (1998)
Manning, C.D., Schutze, H.: Foundation of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)
Eguchi, K., Oyama, K., Ishida, E., Kando, N., Kuriyama, K.: System Evaluation Methods for Web Retrieval Tasks Considering Hyperlink Structure. In: The 12th International World Wide Web Conference, No.poster-344, Budapest, Hungary (2003)
Eiron, N., McCurley, K.S.: Analysis of Anchor Text for Web Search. In: Proc. of the 26th Annual International ACM SIGIR 2003 Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 459–460 (August 2003)
Glover, E., Tsioutsiouliklis, K., Lawrence, S., Pennock, D., Flake, G.: Using Web Structure for Classifying and Describing Web Pages. In: Proc. 11th WWW, pp. 562–569 (2002)
Haveliwala, T.: Topic-sensitive PageRank. In: Proceedings of the eleventh international conference on World Wide Web, pp. 517–526. ACM Press, New York (2002)
Jin, R., Hauptmann, A.G., Zhai, C.: Title Language Model for Information Retrieval. In: Proc. of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 42–48. ACM, New York (2002)
Kao, H.-Y., Lin, S.-H.: Mining Web Information Structure and Content Based on Entropy Analysis. IEEE Transactions on Knowledge and Data Engineering 16(1) (2004)
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. In: Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (1998)
Kraft, R., Zien, J.: Mining Anchor Text for Query Refinement. In: Proceeding of the Thirteenth International Conference on World Wide Web, New York, USA, May 17-22 (2003)
Richardson, M., Domingos, P.: The Intelligent Surfer: Probabilistic Combination of Link and Content Information in Pagerank. In: Advances in Neural Information Processing Systems, pp. 1441–1448. MIT Press, Cambridge (2002)
Westerveld, T., Kraaij, W., Hiemstra, D.: Retrieving Web Pages Using Content, Links, URLs and Anchors. In: Voorhees, Harman, pp. 52–61 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, Y., Umemura, K. (2005). Literal-Matching-Biased Link Analysis. In: Myaeng, S.H., Zhou, M., Wong, KF., Zhang, HJ. (eds) Information Retrieval Technology. AIRS 2004. Lecture Notes in Computer Science, vol 3411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31871-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-31871-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25065-4
Online ISBN: 978-3-540-31871-2
eBook Packages: Computer ScienceComputer Science (R0)