Multi-document summarization via submodularity

Li, Jingxuan; Li, Lei; Li, Tao

doi:10.1007/s10489-012-0336-1

Multi-document summarization via submodularity

Published: 09 February 2012

Volume 37, pages 420–430, (2012)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jingxuan Li¹,
Lei Li¹ &
Tao Li¹

404 Accesses
24 Citations
Explore all metrics

Abstract

Multi-document summarization is becoming an important issue in the Information Retrieval community. It aims to distill the most important information from a set of documents to generate a compressed summary. Given a set of documents as input, most of existing multi-document summarization approaches utilize different sentence selection techniques to extract a set of sentences from the document set as the summary. The submodularity hidden in the term coverage and the textual-unit similarity motivates us to incorporate this property into our solution to multi-document summarization tasks. In this paper, we propose a new principled and versatile framework for different multi-document summarization tasks using submodular functions (Nemhauser et al. in Math. Prog. 14(1):265–294, 1978) based on the term coverage and the textual-unit similarity which can be efficiently optimized through the improved greedy algorithm. We show that four known summarization tasks, including generic, query-focused, update, and comparative summarization, can be modeled as different variations derived from the proposed framework. Experiments on benchmark summarization data sets (e.g., DUC04-06, TAC08, TDT2 corpora) are conducted to demonstrate the efficacy and effectiveness of our proposed framework for the general multi-document summarization tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Chen CM, Liu CY (2009) Personalized e-news monitoring agent system for tracking user-interested Chinese news events. Appl Intell 30(2):121–141
Article MATH Google Scholar
Dang HT (2007) Overview of DUC 2007. In: Document understanding conference, pp 1–10
Google Scholar
Dang HT, Owczarzak K (2008) Overview of the TAC 2008 update summarization task. In: Proceedings of text analysis conference
Google Scholar
Daumé H, Marcu D (2006) Bayesian query-focused summarization. In: Annual meeting—Association for Computational Linguistics, vol 44, p 305
Google Scholar
Dimililer N, Varoğlu E, Altınçay H (2009) Classifier subset selection for biomedical named entity recognition. Appl Intell 31(3):267–282
Article Google Scholar
Erkan G, Radev DR (2004) Lexpagerank: Prestige in multi-document text summarization. In: Proceedings of EMNLP, vol 4
Google Scholar
Gérard C et al (1984) Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the Rado-Edmonds theorem. Discrete Appl Math 7(3):251–274
Article MathSciNet MATH Google Scholar
Goldstein J, Mittal V, Carbonell J, Kantrowitz M (2000) Multi-document summarization by sentence extraction. In: NAACL-ANLP 2000 workshop on automatic summarization. Association for Computational Linguistics, Stroudsburg, pp 40–48
Chapter Google Scholar
Haghighi A, Vanderwende L (2009) Exploring content models for multi-document summarization. In: Proceedings of human language technologies: The 2009 annual conference of the North American Chapter of the Association for Computational Linguistics on ZZZ. Association for Computational Linguistics, Stroudsburg, pp 362–370
Google Scholar
Jurafsky D, Martin JH, Kehler A, Vander Linden K, Ward N (2000) Speech and language processing. Prentice Hall, New York
Google Scholar
Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Inf Process Lett 70(1):39–45
Article MathSciNet MATH Google Scholar
Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N (2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, p 429
Google Scholar
Li J, Li L, Li T (2011) MSSF: A multi-document summarization framework based on submodularity. In: Proceedings of SIGIR’11
Google Scholar
Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Proceedings of the workshop on text summarization branches out (WAS 2004), pp 25–26
Google Scholar
Lin H, Bilmes J (2010) Multi-document summarization via budgeted maximization of submodular functions. In: NAACL/HLT
Google Scholar
Mani I (2001) Automatic summarization. Comput Linguist 28(2)
Minoux M (1978) Accelerated greedy algorithms for maximizing submodular set functions. Optim Tech 234–243
Nastase V (2008) Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 763–772
Chapter Google Scholar
Nemhauser GL, Wolsey LA (1981) Maximizing submodular set functions: formulations and analysis of algorithms. Stud Graphs Discrete Program 11:279–301
Article MathSciNet MATH Google Scholar
Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions. Math Program 14(1):265–294
Article MathSciNet MATH Google Scholar
Radev DR, Jing H, Sty M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manag 40(6):919–938
Article MATH Google Scholar
Saggion H, Bontcheva K, Cunningham H (2003) Robust generic and query-based summarisation. In: Proceedings of the European chapter of computational linguistics (EACL). Research notes and demos
Google Scholar
Steinberger J, Jezek K (2004) Using latent semantic analysis in text summarization and summary evaluation. In: Proc. ISIM04, pp 93–100
Google Scholar
Tang J, Yao L, Chen D (2009) Multi-topic based query-oriented summarization. In: Proceedings of SDM
Google Scholar
Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: Proceedings of IJCAI, pp 2903–2908
Google Scholar
Wan X, Yang J, Xiao J (2007) Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Annual meeting—Association for Computational Linguistics, vol 45, p 552
Google Scholar
Wang D, Li T, Zhu S, Ding C (2008) Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 307–314
Chapter Google Scholar
Wang D, Zhu S, Li T, Gong Y (2009) Comparative document summarization via discriminative sentence selection. In: Proceeding of the 18th ACM conference on information and knowledge management. ACM, New York, pp 1963–1966
Chapter Google Scholar
Wei F, Li W, Lu Q, He Y (2008) Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 283–290
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Information Sciences, Florida International University, 11200 SW 8th St, Miami, FL, 33199, USA
Jingxuan Li, Lei Li & Tao Li

Authors

Jingxuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Lei Li
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Li, L. & Li, T. Multi-document summarization via submodularity. Appl Intell 37, 420–430 (2012). https://doi.org/10.1007/s10489-012-0336-1

Download citation

Published: 09 February 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s10489-012-0336-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-document summarization via submodularity

Abstract

Access this article

Similar content being viewed by others

Recent automatic text summarization techniques: a survey

K-Means algorithm based on multi-feature-induced order

A survey on neural topic models: methods, applications, and challenges

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-document summarization via submodularity

Abstract

Access this article

Similar content being viewed by others

Recent automatic text summarization techniques: a survey

K-Means algorithm based on multi-feature-induced order

A survey on neural topic models: methods, applications, and challenges

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation