Article

Extracting a website's content structure from its link structure

Authors:

Nan Liu,

Christopher C. YangAuthors Info & Claims

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Pages 345 - 346

https://doi.org/10.1145/1099554.1099660

Published: 31 October 2005 Publication History

Get Access

Abstract

Hierarchical models are commonly used to organize a Website's content. A Website's content structure can be represented by a topic hierarchy, a directed tree rooted at a Website's homepage in which the vertices and edges correspond to Web pages and hyperlinks. In this work, we propose an algorithm for extracting a Website's topic hierarchy from its link structure. The proposed algorithm consists of a construction stage and a refining stage, in which we analyze the semantic relationships between web pages based on link structure, web page content and directory structure. We've done extensive experiments using different Websites and obtained very promising results.

References

[1]

W.S. Li, O Kolak, Q. Vu and H. Takano. Defining Logical Domains in a Website. Proc. of 11th ACM Conf. on Hypertext and Hypermedia, San Antonio, 2000

Digital Library

Google Scholar

[2]

Z. Chen, S. Liu, W. Liu, G. Pu and W.Y. Ma. Building a Web Thesaurus from Web Link Structure. In Proc. of the 25th ACM SIGIR Conference, Finland, 2002

Digital Library

Google Scholar

[3]

N. Liu and C. C. Yang. Mining Web Site's Topic Hierarchy. In Proc. of International World Wide Web Conference, Tokyo, Japan, 2005.

Digital Library

Google Scholar

Cited By

View all

Alshukri ACoenen F(2017)Mining the information architecture of the WWW using automated website boundary detectionWeb Intelligence10.3233/WEB-17036515:4(269-290)Online publication date: 20-Nov-2017
https://doi.org/10.3233/WEB-170365
Yang CLin JWei C(2010)Retaining knowledge for document management: Category‐tree integration by exploiting category relationships and hierarchical structuresJournal of the American Society for Information Science and Technology10.1002/asi.2131861:7(1313-1331)Online publication date: 14-Jun-2010
https://doi.org/10.1002/asi.21318
Liu NYang CWilliamson CZurko MPatel-Schneider PShenoy P(2007)A link classification based approach to website topic hierarchy generationProceedings of the 16th international conference on World Wide Web10.1145/1242572.1242728(1127-1128)Online publication date: 8-May-2007
https://dl.acm.org/doi/10.1145/1242572.1242728
Show More Cited By

Index Terms

Extracting a website's content structure from its link structure
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Recommendations

A link classification based approach to website topic hierarchy generation
WWW '07: Proceedings of the 16th international conference on World Wide Web

Hierarchical models are commonly used to organize a Website's content. A Website's content structure can be represented by a topic hierarchy, a directed tree rooted at a Website's homepage in which the vertices and edges correspond to Web pages and ...
Mining web site's topic hierarchy
WWW '05: Special interest tracks and posters of the 14th international conference on World Wide Web

Searching and navigating a Web site is a tedious task and the hierarchical models, such as site maps, are frequently used for organizing the Web site's content. In this work, we propose to model a Web site's content structure using the topic hierarchy, ...
Extracting semantic structure of web documents using content and visual information
WWW '05: Special interest tracks and posters of the 14th international conference on World Wide Web

This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is utilized using the VIPS algorithm and the content information using a pre-...

Comments

Information & Contributors

Information

Published In

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

October 2005

854 pages

ISBN:1595931406

DOI:10.1145/1099554

General Chair:
Otthein Herzog
University of Bremen, Germany
,
Program Chairs:
Hans-Jörg Schek
University for Health Sciences, Medical Informatics and Technology, Austria
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Abdur Chowdhury
America Online, USA
,
Wilfried Teiken
IBM T.J. Watson Research Center, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIKM05

Sponsor:

CIKM05: Conference on Information and Knowledge Management

October 31 - November 5, 2005

Bremen, Germany

Acceptance Rates

CIKM '05 Paper Acceptance Rate 77 of 425 submissions, 18%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
502
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Alshukri ACoenen F(2017)Mining the information architecture of the WWW using automated website boundary detectionWeb Intelligence10.3233/WEB-17036515:4(269-290)Online publication date: 20-Nov-2017
https://doi.org/10.3233/WEB-170365
Yang CLin JWei C(2010)Retaining knowledge for document management: Category‐tree integration by exploiting category relationships and hierarchical structuresJournal of the American Society for Information Science and Technology10.1002/asi.2131861:7(1313-1331)Online publication date: 14-Jun-2010
https://doi.org/10.1002/asi.21318
Liu NYang CWilliamson CZurko MPatel-Schneider PShenoy P(2007)A link classification based approach to website topic hierarchy generationProceedings of the 16th international conference on World Wide Web10.1145/1242572.1242728(1127-1128)Online publication date: 8-May-2007
https://dl.acm.org/doi/10.1145/1242572.1242728
Wang QLiu YLuo J(2006)Exploiting link analysis with a three-layer web structure modelProceedings of the 7th international conference on Web Information Systems10.1007/11912873_21(187-198)Online publication date: 23-Oct-2006
https://dl.acm.org/doi/10.1007/11912873_21

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

A link classification based approach to website topic hierarchy generation

Mining web site's topic hierarchy

Extracting semantic structure of web documents using content and visual information

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations