Link Based Clustering of Web Search Results

Wang, Yitong; Kitsuregawa, Masaru

doi:10.1007/3-540-47714-4_21

Yitong Wang⁷ &
Masaru Kitsuregawa⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2118))

Included in the following conference series:

International Conference on Web-Age Information Management

364 Accesses
36 Citations

Abstract

With information proliferation on the Web, how to obtain high-quality information from the Web has been one of hot research topics in many fields like Database, IR as well as AI. Web search engine is the most commonly used tool for information retrieval; however, its current status is far from satisfaction. In this paper, we propose a new approach to cluster search results returned from Web search engine using link analysis. Unlike document clustering algorithms in IR that based on common words/phrases shared between documents, our approach is base on common links shared by pages using co-citation and coupling analysis. We also extend standard clustering algorithm K-means to make it more natural to handle noises and apply it to web search results. By filtering some irrelevant pages, our approach clusters high quality pages into groups to facilitate users’ accessing and browsing. Preliminary experiments and evaluations are conducted to investigate its effectiveness. The experiment results show that clustering on web search results via link analysis is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kleinberg 98 Authoritative sources in a hyperlinked environment. In proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA), January 1998.
Google Scholar
Ravi Kumar et. al. 99 Trawling the Web for emerging cyber-communities. In Proceedings of 8^th WWW conference, 1999, Toronto, Canada.
Google Scholar
Brin and Page 98 The anatomy of a large-scale hypertextual web search engine. In Proceedings of WWW7, Brisbane, Australia, April 1998.
Google Scholar
Oren Zamir and Oren Etzioni 99 Grouper: A Dynamic Clustering Interface to Web Search Results. In Proceedings of 8^th WWW Conference, Toronto Canada.
Google Scholar
Richard C. Dubes and Anil K. Jain, Algorithms for Clustering Data, Prentice Hall, 1988
Google Scholar
Oren Zamir and Oren Etzioni 97 Fast and Intuitive clustering of Web documents, KDD’97, pp287–290
Google Scholar
Oren Zamir and Oren Etzioni 98 Web document clustering: A feasibility demonstration. In Proceedings of SIGIR’ 98 Melbourne, Australia.
Google Scholar
Zhihua Jiang et. al. Retriever: Improving Web Search Engine Results Using Clustering. http://citeseer.nj.nec.com/275012.html.
Ron Weiss et. al. 96 Hypursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering. ACM Conference on Hypertext, Washington USA, 1996
Google Scholar
Michael Steinbach et. al. A Comparison of Document Clustering techniques. KDD’2000. Technical report of University of Minnesota.
Google Scholar
James Pitkow and Peter Pirolli 97 Life, Death and lawfulness on the Electronic Frontier. In proceedings of ACM SIGCHI Conference on Human Factors in computing, 1997
Google Scholar
Cutting, D.R. et.al. 92 Scatter/gather: ACluster-based approach to browsing large document collections. In Proceedings of the 15^th ACM SIGIR, pp 318–329; 1992
Google Scholar
A.V. Leouski and W.B. Croft. 96 An evaluation of techniques for clustering search results. Technical ReportIR-76 Department of Computer Science, University of Massachusetts, Amherst, 1996
Google Scholar
Broder et. al. 97 Syntactic clustering of the Web. In proceedings of the Sixth International World Wide Web Conference, April 1997, pages 391–404.
Google Scholar
Bharat and Henzinger 98 Improved algorithms for topic distillation in hyperlinked environments. In Proceedings of the 21st SIGIR conference, Melbourne, Australia, 1998.
Google Scholar
Chakrabarti et. al. 98 Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. Proceedings of the 7th WorldWide Web conference, 1998.
Google Scholar
Florescu, Levy and Mendelzon 98 Database Techniques for the World-Wide Web: A Survey. SIGMOD Record 27(3): 59–74 (1998).
Article Google Scholar
Gibson, Kleinberg and Raghavan 98 Inferring Web communities from link topology. Proc. 9th ACM Conference on Hypertext and Hypermedia, 1998.
Google Scholar
Agrawal and Srikant 94 Fast Algorithms for mining Association rules, In Proceedings of VLDB, Sept 1994, Santiago, Chile.
Google Scholar
M.M. Kessler, Bibliographic coupling between scientific papers, American Documentation, 14(1963), pp 10–25
Article Google Scholar
H. Small, Co-citation in the scientific literature: A new measure of the relationship between two documents, J. American Soc. Info. Sci., 24(1973), pp 265–269
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Industrial Science, The University of Tokyo, Japan
Yitong Wang & Masaru Kitsuregawa

Authors

Yitong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Masaru Kitsuregawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Software Engineering, George Mason University, Fairfax, VA, 22030-4444, USA
X. Sean Wang
Department of Computer Science and Engineering, Northeastern University, Shenyang, 110004, China
Ge Yu
Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Hongjun Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Kitsuregawa, M. (2001). Link Based Clustering of Web Search Results. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_21

Download citation

DOI: https://doi.org/10.1007/3-540-47714-4_21
Published: 28 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42298-3
Online ISBN: 978-3-540-47714-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics