research-article

Generating Incremental Length Summary Based on Hierarchical Topic Coverage Maximization

Authors:

Tat Seng ChuaAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 7, Issue 3

Article No.: 29, Pages 1 - 33

https://doi.org/10.1145/2809433

Published: 17 February 2016 Publication History

Abstract

Document summarization is playing an important role in coping with information overload on the Web. Many summarization models have been proposed recently, but few try to adjust the summary length and sentence order according to application scenarios. With the popularity of handheld devices, presenting key information first in summaries of flexible length is of great convenience in terms of faster reading and decision-making and network consumption reduction. Targeting this problem, we introduce a novel task of generating summaries of incremental length. In particular, we require that the summaries should have the ability to automatically adjust the coverage of general-detailed information when the summary length varies. We propose a novel summarization model that incrementally maximizes topic coverage based on the document’s hierarchical topic model. In addition to the standard Rouge-1 measure, we define a new evaluation metric based on the similarity of the summaries’ topic coverage distribution in order to account for sentence order and summary length. Extensive experiments on Wikipedia pages, DUC 2007, and general noninverted writing style documents from multiple sources show the effectiveness of our proposed approach. Moreover, we carry out a user study on a mobile application scenario to show the usability of the produced summary in terms of improving judgment accuracy and speed, as well as reducing the reading burden and network traffic.

References

[1]

Rachit Arora and Balaraman Ravindran. 2008. Latent Dirichlet allocation based multi-document summarization. In Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data. ACM, 91--97.

Digital Library

[2]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, 1 (2003), 993--1022.

Digital Library

[3]

Ronald Brandow, Karl Mitze, and Lisa F. Rau. 1995. Automatic condensation of electronic publications by sentence selection. Information Processing & Management 31, 5 (1995), 675--685.

Digital Library

[4]

Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107--117.

Digital Library

[5]

Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 335--336.

Digital Library

[6]

Trevor Cohn and Mirella Lapata. 2013. An abstractive approach to sentence compression. ACM Transactions on Intelligent Systems and Technology (TIST) 4, 3 (2013), 41.

Digital Library

[7]

Jean-Yves Delort and Enrique Alfonseca. 2012. DualSum: A topic-model based approach for update summarization. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. ACL, 214--223.

Digital Library

[8]

Harold P. Edmundson. 1969. New methods in automatic extracting. Journal of the ACM 16, 2 (1969), 264--285.

Digital Library

[9]

Brigitte Endres-Niggemeyer, Elisabeth Maier, and Alexander Sigel. 1995. How to implement a naturalistic model of abstracting: Four core working steps of an expert abstractor. Information Processing & Management 31, 5 (1995), 631--674.

Digital Library

[10]

Günes Erkan and Dragomir R. Radev. 2004. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22, 1 (2004), 457--479.

[11]

Yihong Gong and Xin Liu. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 19--25.

Digital Library

[12]

Aria Haghighi and Lucy Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. ACL, 362--370.

Digital Library

[13]

Sanda Harabagiu and Finley Lacatusu. 2005. Topic themes for multi-document summarization. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 202--209.

Digital Library

[14]

Zhanying He, Chun Chen, Jiajun Bu, Can Wang, Lijun Zhang, Deng Cai, and Xiaofei He. 2012. Document summarization based on data reconstruction. In Proceedings of the 26th AAAI Conference on Artificial Intelligence. AAAI, 620--626.

[15]

Marti A. Hearst and Christian Plaunt. 1993. Subtopic structuring for full-length document access. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 59--68.

Digital Library

[16]

A. Kogilavani and P. Balasubramanie. 2012. Update summary generation based on semantically adapted vector space model. International Journal of Computer Applications 42, 16 (2012).

[17]

Julian Kupiec, Jan Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 68--73.

Digital Library

[18]

Dawn J. Lawrie and W. Bruce Croft. 2003. Generating hierarchical summaries for web searches. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM, 457--458.

Digital Library

[19]

Liangda Li, Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu. 2009. Enhancing diversity, coverage and balance for summarization through structure learning. In Proceedings of the 18th International Conference on World Wide Web. ACM, 71--80.

Digital Library

[20]

Wei Li and Andrew McCallum. 2006. Pachinko allocation: DAG-structured mixture models of topic correlations. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 577--584.

Digital Library

[21]

Xuan Li, Liang Du, and Yi-Dong Shen. 2011. Graph-based marginal ranking for update summarization. In SDM. SIAM, 486--497.

[22]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. ACL, 74--81.

[23]

Inderjeet Mani and Eric Bloedorn. 1998. Machine learning of generic and user-focused summarization. In Proceedings of the 15th National Conference on Artificial Intelligence. AAAI, 821--826.

Digital Library

[24]

Rada Mihalcea. 2004. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. ACL, 20.

Digital Library

[25]

David Mimno, Wei Li, and Andrew McCallum. 2007. Mixtures of hierarchical topics with pachinko allocation. In Proceedings of the 24th International Conference on Machine Learning. ACM, 633--640.

Digital Library

[26]

Zhao-Yan Ming, Tat-Seng Chua, and Gao Cong. 2010a. Exploring domain-specific term weight in archived question search. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 1605--1608.

Digital Library

[27]

Zhao-Yan Ming, Kai Wang, and Tat-Seng Chua. 2010b. Prototype hierarchy based clustering for the categorization and navigation of web collections. In SIGIR. ACM, New York, NY, 2--9.

Digital Library

[28]

Zhao Yan Ming, Jintao Ye, and Tat Seng Chua. 2014. A dynamic reconstruction approach to topic summarization of user-generated-content. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 311--320.

Digital Library

[29]

Taesun Moon and Katrin Erk. 2013. An inference-based model of word meaning in context as a paraphrase distribution. ACM Transactions on Intelligent Systems and Technology (TIST) 4, 3 (2013), 42.

Digital Library

[30]

Jahna Otterbacher, Dragomir Radev, and Omer Kareem. 2006. News to go: Hierarchical text summarization for mobile devices. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 589--596.

Digital Library

[31]

Dragomir R. Radev. 2000. A common theory of information fusion from multiple text sources step one: Cross-document structure. In Proceedings of the 1st SIGdial Workshop on Discourse and Dialogue. ACL, 74--83.

Digital Library

[32]

Dragomir R. Radev, Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroid-based summarization of multiple documents: Sentence extraction, utility-based evaluation, and user studies. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization. ACL, 21--30.

Digital Library

[33]

Josef Steinberger and Karel Ježek. 2009. Update summarization based on novel topic distribution. In Proceedings of the 9th ACM Symposium on Document Engineering. ACM, 205--213.

Digital Library

[34]

Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2005. Sharing clusters among related groups: Hierarchical Dirichlet processes. In Advances in Neural Information Processing Systems 18. NIPS, 271--278.

[35]

Xiaojun Wan and Jianwu Yang. 2006. Improved affinity graph based multi-document summarization. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short papers. ACL, 181--184.

Digital Library

[36]

Chi Wang, Xiao Yu, Yanen Li, Chengxiang Zhai, and Jiawei Han. 2013. Content coverage maximization on word networks for hierarchical topic summarization. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management. ACM, 249--258.

Digital Library

[37]

Dingding Wang and Tao Li. 2010. Document update summarization using incremental hierarchical clustering. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 279--288.

Digital Library

[38]

Dingding Wang, Tao Li, Shenghuo Zhu, and Chris Ding. 2008. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 307--314.

Digital Library

[39]

Dingding Wang, Shenghuo Zhu, Tao Li, and Yihong Gong. 2009. Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. ACL, 297--300.

Digital Library

[40]

Mark Wasson. 1998. Using leading text for news summaries: Evaluation results and implications for commercial summarization applications. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. ACL, 1364--1368.

Digital Library

[41]

Li Wenjie, Wei Furu, Lu Qin, and He Yanxiang. 2008. PNR 2: Ranking sentences with positive and negative reinforcement for query-oriented update summarization. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. ACL, 489--496.

Digital Library

[42]

Christopher C. Yang and Fu Lee Wang. 2003. Fractal summarization for mobile devices to access large documents on the web. In Proceedings of the 12th International Conference on World Wide Web. ACM, 215--224.

Digital Library

[43]

Yinfei Yang and Ani Nenkova. 2014. Detecting information-dense texts in multiple news domains. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. AAAI, 1650--1656.

Digital Library

[44]

Dongsong Zhang. 2007. Web content adaptation for mobile handheld devices. Communications of the ACM 50, 2 (2007), 75--79.

Digital Library

Cited By

Al-Sabahi KZuping ZNadher M(2018)A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)IEEE Access10.1109/ACCESS.2018.28291996(24205-24212)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2018.2829199

Index Terms

Generating Incremental Length Summary Based on Hierarchical Topic Coverage Maximization
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Heterogeneous-Length Text Topic Modeling for Reader-Aware Multi-Document Summarization

More and more user comments like Tweets are available, which often contain user concerns. In order to meet the demands of users, a good summary generating from multiple documents should consider reader interests as reflected in reader comments. In this ...
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Using only cross-document relationships for both generic and topic-focused multi-document summarizations
Abstract
In recent years graph-ranking based algorithms have been proposed for single document summarization and generic multi-document summarization. The algorithms make use of the “votings” or “recommendations” between sentences to evaluate the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 7, Issue 3

Regular Papers, Survey Papers and Special Issue on Recommender System Benchmarks

April 2016

472 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/2885506

Editor:
Yu Zheng
Microsoft Research, China

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2016

Accepted: 01 July 2015

Revised: 01 May 2015

Received: 01 December 2014

Published in TIST Volume 7, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
231
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Al-Sabahi KZuping ZNadher M(2018)A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)IEEE Access10.1109/ACCESS.2018.28291996(24205-24212)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2018.2829199

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents