Conferences >2013 IEEE International Confe...

Extracting the semantic content of web pages via repeated structures

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Web pages may carry semantics that are very important to the authors and the readers. Due to many reasons, the authors may insert contents that are irrelevant to the unde...Show More

Metadata

Abstract:

Web pages may carry semantics that are very important to the authors and the readers. Due to many reasons, the authors may insert contents that are irrelevant to the underlying semantics of the page to different positions of the page, such as advertizements, guide bars, links. As a result, it may not lead good effect by using all the data of a web page to model its semantics. In this paper, we propose a framework that can extract the real semantic content from web pages via repeated structures of the HTML data. Our algorithm first detect the real semantic blocks in web pages via repeated structure segmentation, then extracts the real semantic content of the pages from real semantic blocks.

Published in: 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Date of Conference: 15-19 July 2013

Date Added to IEEE Xplore: 03 October 2013

Electronic ISBN:978-1-4799-1604-7

DOI: 10.1109/ICMEW.2013.6618450

Conference Location: San Jose, CA

Contents

References is not available for this document.

Extracting the semantic content of web pages via repeated structures

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Extracting the semantic content of web pages via repeated structures

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?