Abstract
With information proliferation on the Web, how to obtain high-quality information from the Web has been one of hot research topics in many fields like Database, IR as well as AI. Web search engine is the most commonly used tool for information retrieval; however, its current status is far from satisfaction. In this paper, we propose a new approach to cluster search results returned from Web search engine using link analysis. Unlike document clustering algorithms in IR that based on common words/phrases shared between documents, our approach is base on common links shared by pages using co-citation and coupling analysis. We also extend standard clustering algorithm K-means to make it more natural to handle noises and apply it to web search results. By filtering some irrelevant pages, our approach clusters high quality pages into groups to facilitate users’ accessing and browsing. Preliminary experiments and evaluations are conducted to investigate its effectiveness. The experiment results show that clustering on web search results via link analysis is promising.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kleinberg 98 Authoritative sources in a hyperlinked environment. In proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA), January 1998.
Ravi Kumar et. al. 99 Trawling the Web for emerging cyber-communities. In Proceedings of 8th WWW conference, 1999, Toronto, Canada.
Brin and Page 98 The anatomy of a large-scale hypertextual web search engine. In Proceedings of WWW7, Brisbane, Australia, April 1998.
Oren Zamir and Oren Etzioni 99 Grouper: A Dynamic Clustering Interface to Web Search Results. In Proceedings of 8th WWW Conference, Toronto Canada.
Richard C. Dubes and Anil K. Jain, Algorithms for Clustering Data, Prentice Hall, 1988
Oren Zamir and Oren Etzioni 97 Fast and Intuitive clustering of Web documents, KDD’97, pp287–290
Oren Zamir and Oren Etzioni 98 Web document clustering: A feasibility demonstration. In Proceedings of SIGIR’ 98 Melbourne, Australia.
Zhihua Jiang et. al. Retriever: Improving Web Search Engine Results Using Clustering. http://citeseer.nj.nec.com/275012.html.
Ron Weiss et. al. 96 Hypursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering. ACM Conference on Hypertext, Washington USA, 1996
Michael Steinbach et. al. A Comparison of Document Clustering techniques. KDD’2000. Technical report of University of Minnesota.
James Pitkow and Peter Pirolli 97 Life, Death and lawfulness on the Electronic Frontier. In proceedings of ACM SIGCHI Conference on Human Factors in computing, 1997
Cutting, D.R. et.al. 92 Scatter/gather: ACluster-based approach to browsing large document collections. In Proceedings of the 15th ACM SIGIR, pp 318–329; 1992
A.V. Leouski and W.B. Croft. 96 An evaluation of techniques for clustering search results. Technical ReportIR-76 Department of Computer Science, University of Massachusetts, Amherst, 1996
Broder et. al. 97 Syntactic clustering of the Web. In proceedings of the Sixth International World Wide Web Conference, April 1997, pages 391–404.
Bharat and Henzinger 98 Improved algorithms for topic distillation in hyperlinked environments. In Proceedings of the 21st SIGIR conference, Melbourne, Australia, 1998.
Chakrabarti et. al. 98 Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. Proceedings of the 7th WorldWide Web conference, 1998.
Florescu, Levy and Mendelzon 98 Database Techniques for the World-Wide Web: A Survey. SIGMOD Record 27(3): 59–74 (1998).
Gibson, Kleinberg and Raghavan 98 Inferring Web communities from link topology. Proc. 9th ACM Conference on Hypertext and Hypermedia, 1998.
Agrawal and Srikant 94 Fast Algorithms for mining Association rules, In Proceedings of VLDB, Sept 1994, Santiago, Chile.
M.M. Kessler, Bibliographic coupling between scientific papers, American Documentation, 14(1963), pp 10–25
H. Small, Co-citation in the scientific literature: A new measure of the relationship between two documents, J. American Soc. Info. Sci., 24(1973), pp 265–269
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Y., Kitsuregawa, M. (2001). Link Based Clustering of Web Search Results. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_21
Download citation
DOI: https://doi.org/10.1007/3-540-47714-4_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42298-3
Online ISBN: 978-3-540-47714-3
eBook Packages: Springer Book Archive