skip to main content
research-article

Generating Incremental Length Summary Based on Hierarchical Topic Coverage Maximization

Published: 17 February 2016 Publication History

Abstract

Document summarization is playing an important role in coping with information overload on the Web. Many summarization models have been proposed recently, but few try to adjust the summary length and sentence order according to application scenarios. With the popularity of handheld devices, presenting key information first in summaries of flexible length is of great convenience in terms of faster reading and decision-making and network consumption reduction. Targeting this problem, we introduce a novel task of generating summaries of incremental length. In particular, we require that the summaries should have the ability to automatically adjust the coverage of general-detailed information when the summary length varies. We propose a novel summarization model that incrementally maximizes topic coverage based on the document’s hierarchical topic model. In addition to the standard Rouge-1 measure, we define a new evaluation metric based on the similarity of the summaries’ topic coverage distribution in order to account for sentence order and summary length. Extensive experiments on Wikipedia pages, DUC 2007, and general noninverted writing style documents from multiple sources show the effectiveness of our proposed approach. Moreover, we carry out a user study on a mobile application scenario to show the usability of the produced summary in terms of improving judgment accuracy and speed, as well as reducing the reading burden and network traffic.

References

[1]
Rachit Arora and Balaraman Ravindran. 2008. Latent Dirichlet allocation based multi-document summarization. In Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data. ACM, 91--97.
[2]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, 1 (2003), 993--1022.
[3]
Ronald Brandow, Karl Mitze, and Lisa F. Rau. 1995. Automatic condensation of electronic publications by sentence selection. Information Processing & Management 31, 5 (1995), 675--685.
[4]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107--117.
[5]
Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 335--336.
[6]
Trevor Cohn and Mirella Lapata. 2013. An abstractive approach to sentence compression. ACM Transactions on Intelligent Systems and Technology (TIST) 4, 3 (2013), 41.
[7]
Jean-Yves Delort and Enrique Alfonseca. 2012. DualSum: A topic-model based approach for update summarization. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. ACL, 214--223.
[8]
Harold P. Edmundson. 1969. New methods in automatic extracting. Journal of the ACM 16, 2 (1969), 264--285.
[9]
Brigitte Endres-Niggemeyer, Elisabeth Maier, and Alexander Sigel. 1995. How to implement a naturalistic model of abstracting: Four core working steps of an expert abstractor. Information Processing & Management 31, 5 (1995), 631--674.
[10]
Günes Erkan and Dragomir R. Radev. 2004. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22, 1 (2004), 457--479.
[11]
Yihong Gong and Xin Liu. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 19--25.
[12]
Aria Haghighi and Lucy Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. ACL, 362--370.
[13]
Sanda Harabagiu and Finley Lacatusu. 2005. Topic themes for multi-document summarization. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 202--209.
[14]
Zhanying He, Chun Chen, Jiajun Bu, Can Wang, Lijun Zhang, Deng Cai, and Xiaofei He. 2012. Document summarization based on data reconstruction. In Proceedings of the 26th AAAI Conference on Artificial Intelligence. AAAI, 620--626.
[15]
Marti A. Hearst and Christian Plaunt. 1993. Subtopic structuring for full-length document access. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 59--68.
[16]
A. Kogilavani and P. Balasubramanie. 2012. Update summary generation based on semantically adapted vector space model. International Journal of Computer Applications 42, 16 (2012).
[17]
Julian Kupiec, Jan Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 68--73.
[18]
Dawn J. Lawrie and W. Bruce Croft. 2003. Generating hierarchical summaries for web searches. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM, 457--458.
[19]
Liangda Li, Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu. 2009. Enhancing diversity, coverage and balance for summarization through structure learning. In Proceedings of the 18th International Conference on World Wide Web. ACM, 71--80.
[20]
Wei Li and Andrew McCallum. 2006. Pachinko allocation: DAG-structured mixture models of topic correlations. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 577--584.
[21]
Xuan Li, Liang Du, and Yi-Dong Shen. 2011. Graph-based marginal ranking for update summarization. In SDM. SIAM, 486--497.
[22]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. ACL, 74--81.
[23]
Inderjeet Mani and Eric Bloedorn. 1998. Machine learning of generic and user-focused summarization. In Proceedings of the 15th National Conference on Artificial Intelligence. AAAI, 821--826.
[24]
Rada Mihalcea. 2004. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. ACL, 20.
[25]
David Mimno, Wei Li, and Andrew McCallum. 2007. Mixtures of hierarchical topics with pachinko allocation. In Proceedings of the 24th International Conference on Machine Learning. ACM, 633--640.
[26]
Zhao-Yan Ming, Tat-Seng Chua, and Gao Cong. 2010a. Exploring domain-specific term weight in archived question search. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 1605--1608.
[27]
Zhao-Yan Ming, Kai Wang, and Tat-Seng Chua. 2010b. Prototype hierarchy based clustering for the categorization and navigation of web collections. In SIGIR. ACM, New York, NY, 2--9.
[28]
Zhao Yan Ming, Jintao Ye, and Tat Seng Chua. 2014. A dynamic reconstruction approach to topic summarization of user-generated-content. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 311--320.
[29]
Taesun Moon and Katrin Erk. 2013. An inference-based model of word meaning in context as a paraphrase distribution. ACM Transactions on Intelligent Systems and Technology (TIST) 4, 3 (2013), 42.
[30]
Jahna Otterbacher, Dragomir Radev, and Omer Kareem. 2006. News to go: Hierarchical text summarization for mobile devices. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 589--596.
[31]
Dragomir R. Radev. 2000. A common theory of information fusion from multiple text sources step one: Cross-document structure. In Proceedings of the 1st SIGdial Workshop on Discourse and Dialogue. ACL, 74--83.
[32]
Dragomir R. Radev, Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroid-based summarization of multiple documents: Sentence extraction, utility-based evaluation, and user studies. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization. ACL, 21--30.
[33]
Josef Steinberger and Karel Ježek. 2009. Update summarization based on novel topic distribution. In Proceedings of the 9th ACM Symposium on Document Engineering. ACM, 205--213.
[34]
Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2005. Sharing clusters among related groups: Hierarchical Dirichlet processes. In Advances in Neural Information Processing Systems 18. NIPS, 271--278.
[35]
Xiaojun Wan and Jianwu Yang. 2006. Improved affinity graph based multi-document summarization. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short papers. ACL, 181--184.
[36]
Chi Wang, Xiao Yu, Yanen Li, Chengxiang Zhai, and Jiawei Han. 2013. Content coverage maximization on word networks for hierarchical topic summarization. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management. ACM, 249--258.
[37]
Dingding Wang and Tao Li. 2010. Document update summarization using incremental hierarchical clustering. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 279--288.
[38]
Dingding Wang, Tao Li, Shenghuo Zhu, and Chris Ding. 2008. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 307--314.
[39]
Dingding Wang, Shenghuo Zhu, Tao Li, and Yihong Gong. 2009. Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. ACL, 297--300.
[40]
Mark Wasson. 1998. Using leading text for news summaries: Evaluation results and implications for commercial summarization applications. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. ACL, 1364--1368.
[41]
Li Wenjie, Wei Furu, Lu Qin, and He Yanxiang. 2008. PNR 2: Ranking sentences with positive and negative reinforcement for query-oriented update summarization. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. ACL, 489--496.
[42]
Christopher C. Yang and Fu Lee Wang. 2003. Fractal summarization for mobile devices to access large documents on the web. In Proceedings of the 12th International Conference on World Wide Web. ACM, 215--224.
[43]
Yinfei Yang and Ani Nenkova. 2014. Detecting information-dense texts in multiple news domains. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. AAAI, 1650--1656.
[44]
Dongsong Zhang. 2007. Web content adaptation for mobile handheld devices. Communications of the ACM 50, 2 (2007), 75--79.

Cited By

View all
  • (2018)A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)IEEE Access10.1109/ACCESS.2018.28291996(24205-24212)Online publication date: 2018

Index Terms

  1. Generating Incremental Length Summary Based on Hierarchical Topic Coverage Maximization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 7, Issue 3
      Regular Papers, Survey Papers and Special Issue on Recommender System Benchmarks
      April 2016
      472 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2885506
      • Editor:
      • Yu Zheng
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 February 2016
      Accepted: 01 July 2015
      Revised: 01 May 2015
      Received: 01 December 2014
      Published in TIST Volume 7, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Multi-document summarization
      2. data reconstruction

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 17 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)IEEE Access10.1109/ACCESS.2018.28291996(24205-24212)Online publication date: 2018

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media