On Combining Link and Contents Information for Web Page Clustering

Wang, Yitong; Kitsuregawa, Masaru

doi:10.1007/3-540-46146-9_89

Yitong Wang⁷ &
Masaru Kitsuregawa⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2453))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1444 Accesses

Abstract

Clustering is currently one of the most crucial techniques for dealing (e.g. resources locating, information interpreting) with massive amount of heterogeneous information on the web, which is beyond human being’s capacity to digest. In this paper, we discuss the shortcomings of pervious approaches and present a unifying clustering algorithm to cluster web search results for a specific query topic by combining link and contents information. Especially, we investigate how to combine link and contents analysis in clustering process to improve the quality and interpretation of web search results.The proposed approach automatically clusters the web search results into high quality, semantically meaningful groups in a concise, easy-to-interpret hierarchy with tagging terms. Preliminary experiments and evaluations are conducted and the experimental results show that the proposed approach is effective and promising. Keywords: co-citation, coupling, anchor window, snippet

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An effective approach for semantic-based clustering and topic-based ranking of web documents

Article 15 March 2018

Topic-Level Clustering on Web Resources

Exploiting Web Sites Structural and Content Features for Web Pages Clustering

References

Kleinberg 98 Jon Kleinberg. Authoritative sources in a hyperlinked environment. In proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA), January 1998.
Google Scholar
Ravi Kumar et. al. 99 Trawling the Web for emerging cyber-communities In Proceedings of 8^th WWW conference, 1999, Toronto, Canada.
Google Scholar
Brin and Page 98 Sergey Brin, and Larry Page. The anatomy of a large scale hypertextual web search engine. In Proceedings of WWW7, Brisbane, Australia, April 1998.
Google Scholar
Oren Zamir and Oren Etzioni 99 Grouper: A Dynamic Clustering Interface to Web Search Results In Proceedings of 8^th WWW Conference, Toronto Canada.
Google Scholar
Richard C. Dubes and Anil K. Jain, Algorithms for Clustering Data, Prentice Hall, 1988
Google Scholar
Oren Zamir and Oren Etzioni 97 Fast and Intuitive clustering of Web documents, KDD’97, pp287–290
Google Scholar
Oren Zamir and Oren Etzioni 98 Web document clustering: A feasibility demonstration In Proceedings of SIGIR’ 98 Melbourne, Australia.
Google Scholar
Zhihua Jiang et. al. Retriever: Improving Web Search Engine Results Using Clustering
Google Scholar
Ron Weiss et. al. 96 Hypursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering Hypertext’96 Washington USA
Google Scholar
Michael Steinbach, George karypis and Vipin Kumar A Comparison of Document Clustering techniques KDD’2000. Technical report of University of Minnesota.
Google Scholar
James Pitkow and Peter Pirolli 97 Life, Death and lawfulness on the Electronic Frontier. In proceedings of ACM SIGCHI Conference on Human Factors in computing, 1997
Google Scholar
Cutting, D.R. et. al.92 Scatter/gather: A Cluster-based approach to browsing large document collections. In Proceedings of the 15th ACM SIGIR Conference on Research and Development in Information Retrieval. pp 318–329; 1992
Google Scholar
A.V. Leouski and W.B. Croft. 96 An evaluation of techniques for clustering search results. Technical Report IR-76 Department of Computer Science, University of Massachusetts, Amherst, 1996
Google Scholar
Broder et. al. 97 Syntactic clustering of the Web. In proceedings of the Sixth International World Wide Web Conference, April 1997, pages 391–404.
Google Scholar
Lenoard Kaufman and Peter J. Rousseeuw. Finding groups in Data: an introduction to cluster analysis Wiley, 1990
Google Scholar
Gibson, Kleinberg and Raghavan 98 David Gibson, Jon Kleinberg, Prabhakar Raghavan. Inferring Web communities from link topology. Proc. 9th ACM Conference on Hypertext and Hypermedia, 1998.
Google Scholar
Agrawal and Srikant 94 Rakesh Agrawal and Ramakrishnan Srikanth. Fast Algorithms for mining Association rules, In Proceedings of VLDB, Sept 1994, Santiago, Chile.
Google Scholar
M.M. Kessler, Bibliographic coupling between scientific papers, American Documentation, 14(1963), pp 10–25
Article Google Scholar
H. Small, Co-citation in the scientific literature: A new measure of the relationship between two documents, J. American Soc. Info. Sci., 24(1973), pp 265–269
Article Google Scholar
Yitong Wang and Masaru Kitsuregawa, Use Link-based clustering to improve web search results, WISE’01, pp. 119–128, 2001
Google Scholar
Taher H. Haveliwa et. al. 99 Scalable techniques for Clustering the Web.
Google Scholar
Taher H. Haveliwa et. al. Similarity Search on the Web: Evaluation and Scalability Considerations Extended Technical Report, 2000
Google Scholar
Einat Amitay Using common hypertext links to identify the best phrasal description of target web documents, SIGIR’98 workshop for Hypertext IR for the web
Google Scholar
Daniel Boley, Maria Gini Partitioning-based Clustering for web document Categorization The paper is also available at http://www.enterpriseware.net/EWRoot/Files/Boley1999a.pdf
J. Dean and M. Henzinger Finding related page in the World Wide Web. Proceedings of WWW8, 1999
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Industrial Science, The University of Tokyo, Japan
Yitong Wang & Masaru Kitsuregawa

Authors

Yitong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Masaru Kitsuregawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université Paul Sabatier, IRIT, 118 route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
Département Informatique, Université Aix-Marseille II, IUT, 413 Avenue Gaston Berger, 13625, Aix-en-Provence Cedex 1, France
Rosine Cicchetti
Institute of Applied Computer Science, University of Linz, Altenbergerstr. 69, 4040, Linz, Austria
Roland Traunmüller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Kitsuregawa, M. (2002). On Combining Link and Contents Information for Web Page Clustering. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2002. Lecture Notes in Computer Science, vol 2453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46146-9_89

Download citation

DOI: https://doi.org/10.1007/3-540-46146-9_89
Published: 20 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44126-7
Online ISBN: 978-3-540-46146-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

On Combining Link and Contents Information for Web Page Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An effective approach for semantic-based clustering and topic-based ranking of web documents

Topic-Level Clustering on Web Resources

Exploiting Web Sites Structural and Content Features for Web Pages Clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On Combining Link and Contents Information for Web Page Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An effective approach for semantic-based clustering and topic-based ranking of web documents

Topic-Level Clustering on Web Resources

Exploiting Web Sites Structural and Content Features for Web Pages Clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation