skip to main content
10.1145/1099554.1099660acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Extracting a website's content structure from its link structure

Published: 31 October 2005 Publication History

Abstract

Hierarchical models are commonly used to organize a Website's content. A Website's content structure can be represented by a topic hierarchy, a directed tree rooted at a Website's homepage in which the vertices and edges correspond to Web pages and hyperlinks. In this work, we propose an algorithm for extracting a Website's topic hierarchy from its link structure. The proposed algorithm consists of a construction stage and a refining stage, in which we analyze the semantic relationships between web pages based on link structure, web page content and directory structure. We've done extensive experiments using different Websites and obtained very promising results.

References

[1]
W.S. Li, O Kolak, Q. Vu and H. Takano. Defining Logical Domains in a Website. Proc. of 11th ACM Conf. on Hypertext and Hypermedia, San Antonio, 2000
[2]
Z. Chen, S. Liu, W. Liu, G. Pu and W.Y. Ma. Building a Web Thesaurus from Web Link Structure. In Proc. of the 25th ACM SIGIR Conference, Finland, 2002
[3]
N. Liu and C. C. Yang. Mining Web Site's Topic Hierarchy. In Proc. of International World Wide Web Conference, Tokyo, Japan, 2005.

Cited By

View all
  • (2017)Mining the information architecture of the WWW using automated website boundary detectionWeb Intelligence10.3233/WEB-17036515:4(269-290)Online publication date: 20-Nov-2017
  • (2010)Retaining knowledge for document management: Category‐tree integration by exploiting category relationships and hierarchical structuresJournal of the American Society for Information Science and Technology10.1002/asi.2131861:7(1313-1331)Online publication date: 14-Jun-2010
  • (2007)A link classification based approach to website topic hierarchy generationProceedings of the 16th international conference on World Wide Web10.1145/1242572.1242728(1127-1128)Online publication date: 8-May-2007
  • Show More Cited By

Index Terms

  1. Extracting a website's content structure from its link structure

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management
      October 2005
      854 pages
      ISBN:1595931406
      DOI:10.1145/1099554
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 October 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. content structure
      2. topic hierarchy
      3. website mining

      Qualifiers

      • Article

      Conference

      CIKM05
      Sponsor:
      CIKM05: Conference on Information and Knowledge Management
      October 31 - November 5, 2005
      Bremen, Germany

      Acceptance Rates

      CIKM '05 Paper Acceptance Rate 77 of 425 submissions, 18%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Mining the information architecture of the WWW using automated website boundary detectionWeb Intelligence10.3233/WEB-17036515:4(269-290)Online publication date: 20-Nov-2017
      • (2010)Retaining knowledge for document management: Category‐tree integration by exploiting category relationships and hierarchical structuresJournal of the American Society for Information Science and Technology10.1002/asi.2131861:7(1313-1331)Online publication date: 14-Jun-2010
      • (2007)A link classification based approach to website topic hierarchy generationProceedings of the 16th international conference on World Wide Web10.1145/1242572.1242728(1127-1128)Online publication date: 8-May-2007
      • (2006)Exploiting link analysis with a three-layer web structure modelProceedings of the 7th international conference on Web Information Systems10.1007/11912873_21(187-198)Online publication date: 23-Oct-2006

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media