Skip to main content

Link Based Clustering of Web Search Results

  • Conference paper
  • First Online:
Advances in Web-Age Information Management (WAIM 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2118))

Included in the following conference series:

Abstract

With information proliferation on the Web, how to obtain high-quality information from the Web has been one of hot research topics in many fields like Database, IR as well as AI. Web search engine is the most commonly used tool for information retrieval; however, its current status is far from satisfaction. In this paper, we propose a new approach to cluster search results returned from Web search engine using link analysis. Unlike document clustering algorithms in IR that based on common words/phrases shared between documents, our approach is base on common links shared by pages using co-citation and coupling analysis. We also extend standard clustering algorithm K-means to make it more natural to handle noises and apply it to web search results. By filtering some irrelevant pages, our approach clusters high quality pages into groups to facilitate users’ accessing and browsing. Preliminary experiments and evaluations are conducted to investigate its effectiveness. The experiment results show that clustering on web search results via link analysis is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kleinberg 98 Authoritative sources in a hyperlinked environment. In proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA), January 1998.

    Google Scholar 

  2. Ravi Kumar et. al. 99 Trawling the Web for emerging cyber-communities. In Proceedings of 8th WWW conference, 1999, Toronto, Canada.

    Google Scholar 

  3. Brin and Page 98 The anatomy of a large-scale hypertextual web search engine. In Proceedings of WWW7, Brisbane, Australia, April 1998.

    Google Scholar 

  4. Oren Zamir and Oren Etzioni 99 Grouper: A Dynamic Clustering Interface to Web Search Results. In Proceedings of 8th WWW Conference, Toronto Canada.

    Google Scholar 

  5. Richard C. Dubes and Anil K. Jain, Algorithms for Clustering Data, Prentice Hall, 1988

    Google Scholar 

  6. Oren Zamir and Oren Etzioni 97 Fast and Intuitive clustering of Web documents, KDD’97, pp287–290

    Google Scholar 

  7. Oren Zamir and Oren Etzioni 98 Web document clustering: A feasibility demonstration. In Proceedings of SIGIR’ 98 Melbourne, Australia.

    Google Scholar 

  8. Zhihua Jiang et. al. Retriever: Improving Web Search Engine Results Using Clustering. http://citeseer.nj.nec.com/275012.html.

  9. Ron Weiss et. al. 96 Hypursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering. ACM Conference on Hypertext, Washington USA, 1996

    Google Scholar 

  10. Michael Steinbach et. al. A Comparison of Document Clustering techniques. KDD’2000. Technical report of University of Minnesota.

    Google Scholar 

  11. James Pitkow and Peter Pirolli 97 Life, Death and lawfulness on the Electronic Frontier. In proceedings of ACM SIGCHI Conference on Human Factors in computing, 1997

    Google Scholar 

  12. Cutting, D.R. et.al. 92 Scatter/gather: ACluster-based approach to browsing large document collections. In Proceedings of the 15th ACM SIGIR, pp 318–329; 1992

    Google Scholar 

  13. A.V. Leouski and W.B. Croft. 96 An evaluation of techniques for clustering search results. Technical ReportIR-76 Department of Computer Science, University of Massachusetts, Amherst, 1996

    Google Scholar 

  14. Broder et. al. 97 Syntactic clustering of the Web. In proceedings of the Sixth International World Wide Web Conference, April 1997, pages 391–404.

    Google Scholar 

  15. Bharat and Henzinger 98 Improved algorithms for topic distillation in hyperlinked environments. In Proceedings of the 21st SIGIR conference, Melbourne, Australia, 1998.

    Google Scholar 

  16. Chakrabarti et. al. 98 Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. Proceedings of the 7th WorldWide Web conference, 1998.

    Google Scholar 

  17. Florescu, Levy and Mendelzon 98 Database Techniques for the World-Wide Web: A Survey. SIGMOD Record 27(3): 59–74 (1998).

    Article  Google Scholar 

  18. Gibson, Kleinberg and Raghavan 98 Inferring Web communities from link topology. Proc. 9th ACM Conference on Hypertext and Hypermedia, 1998.

    Google Scholar 

  19. Agrawal and Srikant 94 Fast Algorithms for mining Association rules, In Proceedings of VLDB, Sept 1994, Santiago, Chile.

    Google Scholar 

  20. M.M. Kessler, Bibliographic coupling between scientific papers, American Documentation, 14(1963), pp 10–25

    Article  Google Scholar 

  21. H. Small, Co-citation in the scientific literature: A new measure of the relationship between two documents, J. American Soc. Info. Sci., 24(1973), pp 265–269

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Y., Kitsuregawa, M. (2001). Link Based Clustering of Web Search Results. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_21

Download citation

  • DOI: https://doi.org/10.1007/3-540-47714-4_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42298-3

  • Online ISBN: 978-3-540-47714-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics