Abstract
Constructing a base set consisting of topic-related web pages is a preliminary step for those web mining algorithms which use the link structure analysis technique based on HITS. However, except checking the anchor text of links and the content of pages, there has been few of research addressing other possibilities to improve topic relevance while collecting the base set. In this paper, we propose a potential hub and authority first (PHA-first) approach utilizing the concept of hub and authority to filter web pages. We investigate the satisfaction of dozens of users about the pages recommended by our method and HITS on different topics. The results indicate that our method is superior to HITS in most cases. In addition, we also evaluate the recall and precision measures of our method. The results show that our method is with relative high precision and low recall for all topics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)
Lempel, R., Moran, S.: SALSA: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst. 19, 131–160 (2001)
Borodin, A., Roberts, G., Rosenthal, J., Tsaparas, P.: Finding authorities and hubs from link structures on the World Wide Web. In: Proceedings of the 10th International World Wide Web Conference, pp. 415–429 (2001)
Vaughan, L.: New measurements for search engine evaluation proposed and tested. Inf. Process. Manage. 40, 677–691 (2004)
Bharat, K., Henzinger, M.R.: Improved algorithms for topic distillation in a hyperlinked environment. In: Proceedings of SIGIR 1998, 21st ACM International Conference on Research and Development in Information Retrieval, pp. 104–111 (1998)
Chakrabarti, S., Dom, B., Indyk, H.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of SIGMOD 1998, pp. 307–318. ACM Press, New York (1998)
Wu, K.-J., Chen, M.-C., Sun, Y.: Automatic topics discovery from hyperlinked documents. Inf. Process. Manage. 40, 239–255 (2004)
Chau, M., Chen, H.: Comparison of Three Vertical Search Spiders. IEEE Computer 36, 56–62 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, LH., Lee, TW. (2005). Collecting Topic-Related Web Pages for Link Structure Analysis by Using a Potential Hub and Authority First Approach. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_98
Download citation
DOI: https://doi.org/10.1007/11430919_98
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)