Abstract
Clustering is currently one of the most crucial techniques for dealing (e.g. resources locating, information interpreting) with massive amount of heterogeneous information on the web, which is beyond human being’s capacity to digest. In this paper, we discuss the shortcomings of pervious approaches and present a unifying clustering algorithm to cluster web search results for a specific query topic by combining link and contents information. Especially, we investigate how to combine link and contents analysis in clustering process to improve the quality and interpretation of web search results.The proposed approach automatically clusters the web search results into high quality, semantically meaningful groups in a concise, easy-to-interpret hierarchy with tagging terms. Preliminary experiments and evaluations are conducted and the experimental results show that the proposed approach is effective and promising. Keywords: co-citation, coupling, anchor window, snippet
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kleinberg 98 Jon Kleinberg. Authoritative sources in a hyperlinked environment. In proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA), January 1998.
Ravi Kumar et. al. 99 Trawling the Web for emerging cyber-communities In Proceedings of 8th WWW conference, 1999, Toronto, Canada.
Brin and Page 98 Sergey Brin, and Larry Page. The anatomy of a large scale hypertextual web search engine. In Proceedings of WWW7, Brisbane, Australia, April 1998.
Oren Zamir and Oren Etzioni 99 Grouper: A Dynamic Clustering Interface to Web Search Results In Proceedings of 8th WWW Conference, Toronto Canada.
Richard C. Dubes and Anil K. Jain, Algorithms for Clustering Data, Prentice Hall, 1988
Oren Zamir and Oren Etzioni 97 Fast and Intuitive clustering of Web documents, KDD’97, pp287–290
Oren Zamir and Oren Etzioni 98 Web document clustering: A feasibility demonstration In Proceedings of SIGIR’ 98 Melbourne, Australia.
Zhihua Jiang et. al. Retriever: Improving Web Search Engine Results Using Clustering
Ron Weiss et. al. 96 Hypursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering Hypertext’96 Washington USA
Michael Steinbach, George karypis and Vipin Kumar A Comparison of Document Clustering techniques KDD’2000. Technical report of University of Minnesota.
James Pitkow and Peter Pirolli 97 Life, Death and lawfulness on the Electronic Frontier. In proceedings of ACM SIGCHI Conference on Human Factors in computing, 1997
Cutting, D.R. et. al.92 Scatter/gather: A Cluster-based approach to browsing large document collections. In Proceedings of the 15th ACM SIGIR Conference on Research and Development in Information Retrieval. pp 318–329; 1992
A.V. Leouski and W.B. Croft. 96 An evaluation of techniques for clustering search results. Technical Report IR-76 Department of Computer Science, University of Massachusetts, Amherst, 1996
Broder et. al. 97 Syntactic clustering of the Web. In proceedings of the Sixth International World Wide Web Conference, April 1997, pages 391–404.
Lenoard Kaufman and Peter J. Rousseeuw. Finding groups in Data: an introduction to cluster analysis Wiley, 1990
Gibson, Kleinberg and Raghavan 98 David Gibson, Jon Kleinberg, Prabhakar Raghavan. Inferring Web communities from link topology. Proc. 9th ACM Conference on Hypertext and Hypermedia, 1998.
Agrawal and Srikant 94 Rakesh Agrawal and Ramakrishnan Srikanth. Fast Algorithms for mining Association rules, In Proceedings of VLDB, Sept 1994, Santiago, Chile.
M.M. Kessler, Bibliographic coupling between scientific papers, American Documentation, 14(1963), pp 10–25
H. Small, Co-citation in the scientific literature: A new measure of the relationship between two documents, J. American Soc. Info. Sci., 24(1973), pp 265–269
Yitong Wang and Masaru Kitsuregawa, Use Link-based clustering to improve web search results, WISE’01, pp. 119–128, 2001
Taher H. Haveliwa et. al. 99 Scalable techniques for Clustering the Web.
Taher H. Haveliwa et. al. Similarity Search on the Web: Evaluation and Scalability Considerations Extended Technical Report, 2000
Einat Amitay Using common hypertext links to identify the best phrasal description of target web documents, SIGIR’98 workshop for Hypertext IR for the web
Daniel Boley, Maria Gini Partitioning-based Clustering for web document Categorization The paper is also available at http://www.enterpriseware.net/EWRoot/Files/Boley1999a.pdf
J. Dean and M. Henzinger Finding related page in the World Wide Web. Proceedings of WWW8, 1999
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Y., Kitsuregawa, M. (2002). On Combining Link and Contents Information for Web Page Clustering. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2002. Lecture Notes in Computer Science, vol 2453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46146-9_89
Download citation
DOI: https://doi.org/10.1007/3-540-46146-9_89
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44126-7
Online ISBN: 978-3-540-46146-3
eBook Packages: Springer Book Archive