skip to main content
10.1145/1593254.1593266acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicecConference Proceedingsconference-collections
research-article

Keyphrase extraction for labeling a website topic hierarchy

Published: 12 August 2009 Publication History

Abstract

Looking for web pages to identify useful information from a website is tedious and time consuming. Search engines are not always helpful due to the vocabulary difference between queries and web pages. Users may also have difficulty to accurately represent their information needs as queries at the beginning of exploration stage. A site map of website provides an outline of the overall structure of website. Without navigating through the website from the root page, users can easily identify the exact webpage to extract useful information to satisfy their information needs. However, site maps are not always available. In our previous work, we develop techniques to generate a website topic hierarchy. In this paper, we extend our work to extract keyphrases to label the web site topic hierarchy. The keyphrases serve in the purpose of summarizing the content so that users can efficiently browse through the site map to pin point the web page that provides the useful information they need. In the proposed keyphrase extraction, there are three major components. The first component is the candidate phrases identification. The second component computes the feature scores for summarization. The features include thematic and presentation features. The third component extracts the keyphrases by combining the feature scores. We have conducted an experiment and obtained promising result.

References

[1]
E. Amitay, and C. Paris, "Automatically Summarizing Websites: Is There a Way Around it?" Proceedings of the 9th ACM International Conference on Information and Knowledge Management, Mclean, VA, USA, 2000.
[2]
A. Berger, and V. Mittal, "Ocelot: a System for Summarizing Web Pages," Proceedings of SIGIR 2000, 2000.
[3]
S. Brin, and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Proc. of the 7th World Wide Web Conference, 1998.
[4]
O. Buyukkokten, H. Garcia-Molina and A. Paepcke. "Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices," Proceedings of the 10th World Wide Web Conference, Hong Kong, China, 2001.
[5]
J. Carbonell, Y. Geng and J. Goldstend. "Automated Query-Relevant Summarization and Diversity-Based Reranking," Proceedings of the IJCAI'97 Workshop on AI in Digital Libraries, 1997.
[6]
S. T. Dumais, J. Platt, D. Heckerman and M. Sahami. "Inductive Learning Algorithms and Representations for Text Categorization," Proceedings of International Conference on Information and Knowledge Management, 1998.
[7]
H. P. Edmunson, "New Methods in Automatic Abstracting," Journal of the Association of Computing Machinery, 16(2), 1969.
[8]
M. A. K. Halliday, and R. Hansan. Cohension in Text. London: Longmans, 1996.
[9]
H. Koike, "Fractal Views: A Fractal-Based Method for Controlling Information Display," ACM Transactions on Information Systems, 13(3) 305--323, 1995.
[10]
B. Krulwich and C. Burkey, "Learning User Information Interests Through the Extraction," Proceedings of AAAI Symposium on Machine Learning in Information Extraction, 1996.
[11]
J. Kupiec, J. Pedersen, and F. Chen, "A Trainable Document Summarizer," Proceedings of the 18th International Conference on Research and Development in Information Retrieval, Seattle, Washington, 1995.
[12]
N. Liu and C. C. Yang, "Mining web Site's Topic Hierarchy," Proceedings of the International World Wide web Conference, Chiba, Japan, May 10--14, 2005.
[13]
N. Liu and C. C. Yang, "A Link Classification based Approach to website Topic Hierarchy Generation," Proceedings of the International World Wide web Conference (WWW'07), Banff, Alberta, Canada, May 8--12, 2007.
[14]
H. P. Luhn, "The Automatic Creation of Literature Abstracts," IBM Journal of Research and Development, 1958, 159--165.
[15]
B. Mandelbrot, The Fractal Geometry of Nature. New York: W.H. Freeman, 1983.
[16]
A. Munoz, "Compound Key word Generation From Document Databases using a Hierarchical Clustering ART Model," Intelligence Data Analysis, 1(1), Armsterdam: Elsevier, 1996.
[17]
M. F. Porter, "An Algorithm for Suffix Stripping," Program, 14(3), 1980, pp130--137
[18]
D. R. Radev, H. Jing, M. Stys and D. Tam. "Centroid-based Summarization of Multiple Documents," Information Processing and Management, 40:919--938, 2004.
[19]
I. H. Turney, "Learning Algorithms for Keyphrase Extraction," Information Retrieval, 2 (4), 1999.
[20]
I. H. Whitten, G. W. Paytner, E. Frank, C. Gutwin and C. G. Nevill-Manning, "KEA: Practical Automatic Keyphrase Extraction," Proceedings of Digital Libraries 99, ACM Press, 1999.
[21]
C. C. Yang and F. L. Wang, "Hierarchical Summarization of Large Documents," Journal of the American Society for Information Science and Technology, vol.59, no.6, 2008, pp.887--902.
[22]
C. C. Yang and N. Liu, "Web Site Topic Hierarchy Generation Based on Link Structure," Journal of the American Society for Information Science and Technology, vol. 60, no.3, 2009, pp.495--508.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICEC '09: Proceedings of the 11th International Conference on Electronic Commerce
August 2009
407 pages
ISBN:9781605585864
DOI:10.1145/1593254
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • School of Business, The University of Hong Kong, Hong Kong
  • Sayling Wen Cultural & Educational Foundation
  • Ministry of Education, Taiwan
  • College of Information Science and Technology, Drexel University, USA
  • Weatherhead School of Management, Case Western Reserve University, USA
  • College of Technology Management, National Tsing Hua University, Taiwan
  • National Science Council, Taiwan
  • Chinese Enterprise Resource Planning Society, Taiwan
  • International Center for Electronic Commerce, Korea Advanced Institute of Science & Technology, Korea

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICEC '09
Sponsor:
ICEC '09: International Conference on E-Commerce
August 12 - 15, 2009
Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 150 of 244 submissions, 61%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 189
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media