skip to main content
10.1145/1321440.1321518acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Learning query-biased web page summarization

Published: 06 November 2007 Publication History

Abstract

Query-biased Web page summarization is the summarization of a Web page reflecting the relevance of it to a specific query. It plays an important role in search results representation of Web search engines. In this paper, we propose a learning-based query-biased Web page summarization method. The summarization problem is solved within the typical sentence selection framework. Different from existing Web page summarization methods that use page content or link context alone, both of them are considered as the sources of sentences in this work. Most of existing learning-based summarization methods treat summarization as a sentence classification problem and train a classifier to discriminate between extracted sentences and non-extracted sentences of all training documents. The basic assumption of these methods is that sentences from different documents are comparable with respect to the class information. In contrast to the classification scheme, a ranking scheme is introduced to rank extracted sentences higher than non-extracted sentences of each training document. The underlying assumption that sentences within a document are comparable is weaker and more reasonable than the assumption of classification-based scheme. Extensive results using intrinsic evaluation metrics gauge many aspects of the proposed method.

References

[1]
Amitay, E. and Paris, C. 2000. Automatically summarising Web sites: is there a way around it? In Proceedings of the Ninth international Conference on information and Knowledge Management (McLean, Virginia, United States, November 06 - 11, 2000). CIKM '00. ACM Press, New York, NY, 173--179.
[2]
Amini, M. and Gallinari, P. 2002. The use of unlabeled data to improve supervised learning for text summarization. In Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Tampere, Finland, August 11 - 15, 2002). SIGIR '02. ACM Press, New York, NY, 105--112.
[3]
Berger, A. L. and Mittal, V. O. 2000. OCELOT: a system for summarizing Web pages. In Proceedings of the 23rd Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Athens, Greece, July 24 - 28, 2000). SIGIR '00. ACM Press, New York, NY, 144--151.
[4]
Berger, A. and Mittal, V. O. 2000. Query-relevant summarization using FAQs. In Proceedings of the 38th Annual Meeting on Association For Computational Linguistics (Hong Kong, October 03 - 06, 2000). Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 294--301.
[5]
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international Conference on Machine Learning (Bonn, Germany, August 07 - 11, 2005). ICML '05, vol. 119. ACM Press, New York, NY, 89--96.
[6]
Buyukkokten, O., Garcia-Molina, H., and Paepcke, A. 2001. Seeing the whole in parts: text summarization for web browsing on handheld devices. In Proceedings of the 10th international Conference on World Wide Web (Hong Kong, Hong Kong, May 01 - 05, 2001). WWW '01. ACM Press, New York, NY, 652--662.
[7]
Carbonell, J. and Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Melbourne, Australia, August 24 - 28, 1998). SIGIR '98. ACM Press, New York, NY, 335--336.
[8]
Chuang, W. T. and Yang, J. 2000. Extracting sentence segments for text summarization: a machine learning approach. In Proceedings of the 23rd Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Athens, Greece, July 24 - 28, 2000). SIGIR '00. ACM Press, New York, NY, 152--159.
[9]
Crammer, K. and Singer, Y. Pranking with ranking. In Proceeding of the conference on Neural Information Processing Systems (NIPS), 2001.
[10]
Craswell, N., Hawking, D., Wilkinson, R., and Wu, M. Overview of the TREC 2003 Web Track, In Proc. TREC 2003, 2003.
[11]
Delort, J., Bouchon-Meunier, B., and Rifqi, M. 2003. Enhanced web document summarization using hyperlinks. In Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia (Nottingham, UK, August 26 - 30, 2003). HYPERTEXT '03. ACM Press, New York, NY, 208--215.
[12]
Edmundson, H. P. 1969. New Methods in Automatic Extracting. J. ACM 16, 2 (Apr. 1969), 264--285.
[13]
Eiron, N. and McCurley, K. S. 2003. Analysis of anchor text for web search. In Proceedings of the 26th Annual international ACM SIGIR Conference on Research and Development in informaion Retrieval (Toronto, Canada, July 28 - August 01, 2003). SIGIR '03. ACM Press, New York, NY, 459--460.
[14]
Freund, Y., Iyer, R., Schapire, R. E., and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4 (Dec. 2003), 933--969.
[15]
Goldstein, J., Kantrowitz, M., Mittal, V., and Carbonell, J. 1999. Summarizing text documents: sentence selection and evaluation metrics. In Proceedings of the 22nd Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Berkeley, California, United States, August 15 - 19, 1999). SIGIR '99. ACM Press, New York, NY, 121--128.
[16]
Herbrich, R., Graepel, T., and Obermayer, K. 2000. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers, pp. 115--132.
[17]
Hirao, T., Isozaki, H., Maeda, E., and Matsumoto, Y. 2002. Extracting important sentences with support vector machines. In Proceedings of the 19th international Conference on Computational Linguistics - Volume 1 (Taipei, Taiwan, August 24 - September 01, 2002). International Conference On Computational Linguistics. Association for Computational Linguistics, Morristown, NJ, 1--7.
[18]
Hu, Y., Xin, G., Song, R., Hu, G., Shi, S., Cao, Y., and Li, H. 2005. Title extraction from bodies of HTML documents and its application to web page retrieval. In Proceedings of the 28th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Salvador, Brazil, August 15 - 19, 2005). SIGIR '05. ACM Press, New York, NY, 250--257.
[19]
Joachims, T. Making large-scale SVM Learning Practical, in Advances in Kernel Methods - Support Vector Learning, B. Schölkopf et al (ed.), MIT-Press, 1999. pp. 169--184.
[20]
Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Edmonton, Alberta, Canada, July 23 - 26, 2002). KDD '02. ACM Press, New York, NY, 133--142.
[21]
Jones, K. S., Galliers, J. R., and Galliers, J. R. 1996 Evaluating Natural Language Processing Systems: an Analysis and Review. Springer-Verlag New York, Inc.
[22]
Kupiec, J., Pedersen, J., and Chen, F. 1995. A trainable document summarizer. In Proceedings of the 18th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Seattle, Washington, United States, July 09 - 13, 1995). E. A. Fox, P. Ingwersen, and R. Fidel, Eds. SIGIR '95. ACM Press, New York, NY, 68--73.
[23]
Luhn, P. H. Automatic creation of literature abstracts. IBM Journal, pages 159--165, 1958.
[24]
Mani, I. and Bloedorn, E. 1998. Machine learning of generic and user-focused summarization. In Proceedings of the Fifteenth National/Tenth Conference on Artificial intelligence/innovative Applications of Artificial intelligence (Madison, Wisconsin, United States). American Association for Artificial Intelligence, Menlo Park, CA, 820--826.
[25]
Mitra, M., Singhal, A., and Buckley, C. Automatic Text Summarization by Paragraph Extraction. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization. (1997) 31--36.
[26]
Radev, D. R., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Qi, H., Çelebi, A., Liu, D., and Drabek, E. 2003. Evaluation challenges in large-scale document summarization. In Proceedings of the 41st Annual Meeting on Association For Computational Linguistics - Volume 1 (Sapporo, Japan, July 07 - 12, 2003). Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 375--382.
[27]
Robertson, S. E. and Walker, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Dublin, Ireland, July 03 - 06, 1994). W. B. Croft and C. J. van Rijsbergen, Eds. Annual ACM Conference on Research and Development in Information Retrieval. Springer-Verlag New York, New York, NY, 232--241.
[28]
Sun, J., Shen, D., Zeng, H., Yang, Q., Lu, Y., and Chen, Z. 2005. Web-page summarization using clickthrough data. In Proceedings of the 28th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Salvador, Brazil, August 15 - 19, 2005). SIGIR '05. ACM Press, New York, NY, 194--201.
[29]
Tombros, A. and Sanderson, M. 1998. Advantages of query biased summaries in information retrieval. In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Melbourne, Australia, August 24 - 28, 1998). SIGIR '98. ACM Press, New York, NY, 2--10.
[30]
White, R. W., Jose, J. M., and Ruthven, I. A task-oriented study on the influencing effects of query-biased summarization in web search. Information processing and management, 39(5) pp 707--733, 2003.

Cited By

View all
  • (2021)Personalized Extractive Summarization for a News Dialogue System2021 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT48900.2021.9383568(1044-1051)Online publication date: 19-Jan-2021
  • (2020)Improving Search Snippets in Context-Aware Web Search ScenariosInformation Retrieval10.1007/978-3-030-56725-5_1(3-16)Online publication date: 10-Aug-2020
  • (2018)Determining Information Relevance Based on Personalization Techniques to Meet Specific User NeedsBusiness Information Systems and Technology 4.010.1007/978-3-319-74322-6_3(31-45)Online publication date: 7-Mar-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. query-biased web page summarization
  3. ranking
  4. support vector machines

Qualifiers

  • Research-article

Conference

CIKM07

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Personalized Extractive Summarization for a News Dialogue System2021 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT48900.2021.9383568(1044-1051)Online publication date: 19-Jan-2021
  • (2020)Improving Search Snippets in Context-Aware Web Search ScenariosInformation Retrieval10.1007/978-3-030-56725-5_1(3-16)Online publication date: 10-Aug-2020
  • (2018)Determining Information Relevance Based on Personalization Techniques to Meet Specific User NeedsBusiness Information Systems and Technology 4.010.1007/978-3-319-74322-6_3(31-45)Online publication date: 7-Mar-2018
  • (2016)Generating Personalized Snippets forWeb Page Recommender SystemsTransactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.C-G4131:5(C-G41_1-12)Online publication date: 2016
  • (2016)A Comparative Study of Query-biased and Non-redundant Snippets for Structured Search on Mobile DevicesProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983699(2389-2394)Online publication date: 24-Oct-2016
  • (2016)A Comparative Study of Answer-Contained Snippets and Traditional SnippetsInformation Retrieval Technology10.1007/978-3-319-48051-0_5(56-67)Online publication date: 15-Oct-2016
  • (2015)Profile-Based Summarisation for Web Site NavigationACM Transactions on Information Systems10.1145/269966133:1(1-39)Online publication date: 17-Feb-2015
  • (2015)Query-biased summary generation assisted by query expansionJournal of the Association for Information Science and Technology10.1002/asi.2322266:5(961-979)Online publication date: 1-May-2015
  • (2013)Acquiring the Gist of Social Network Service Threads via Comparison with WikipediaWeb-Based Multimedia Advancements in Data Communications and Networking Technologies10.4018/978-1-4666-2026-1.ch005(85-97)Online publication date: 2013
  • (2013)Snippet Generation by Identifying Attribute Associated InformationInformation Retrieval Technology10.1007/978-3-642-45068-6_5(50-61)Online publication date: 2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media