Skip to main content
Log in

Multi-document summarization via submodularity

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-document summarization is becoming an important issue in the Information Retrieval community. It aims to distill the most important information from a set of documents to generate a compressed summary. Given a set of documents as input, most of existing multi-document summarization approaches utilize different sentence selection techniques to extract a set of sentences from the document set as the summary. The submodularity hidden in the term coverage and the textual-unit similarity motivates us to incorporate this property into our solution to multi-document summarization tasks. In this paper, we propose a new principled and versatile framework for different multi-document summarization tasks using submodular functions (Nemhauser et al. in Math. Prog. 14(1):265–294, 1978) based on the term coverage and the textual-unit similarity which can be efficiently optimized through the improved greedy algorithm. We show that four known summarization tasks, including generic, query-focused, update, and comparative summarization, can be modeled as different variations derived from the proposed framework. Experiments on benchmark summarization data sets (e.g., DUC04-06, TAC08, TDT2 corpora) are conducted to demonstrate the efficacy and effectiveness of our proposed framework for the general multi-document summarization tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen CM, Liu CY (2009) Personalized e-news monitoring agent system for tracking user-interested Chinese news events. Appl Intell 30(2):121–141

    Article  MATH  Google Scholar 

  2. Dang HT (2007) Overview of DUC 2007. In: Document understanding conference, pp 1–10

    Google Scholar 

  3. Dang HT, Owczarzak K (2008) Overview of the TAC 2008 update summarization task. In: Proceedings of text analysis conference

    Google Scholar 

  4. Daumé H, Marcu D (2006) Bayesian query-focused summarization. In: Annual meeting—Association for Computational Linguistics, vol 44, p 305

    Google Scholar 

  5. Dimililer N, Varoğlu E, Altınçay H (2009) Classifier subset selection for biomedical named entity recognition. Appl Intell 31(3):267–282

    Article  Google Scholar 

  6. Erkan G, Radev DR (2004) Lexpagerank: Prestige in multi-document text summarization. In: Proceedings of EMNLP, vol 4

    Google Scholar 

  7. Gérard C et al (1984) Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the Rado-Edmonds theorem. Discrete Appl Math 7(3):251–274

    Article  MathSciNet  MATH  Google Scholar 

  8. Goldstein J, Mittal V, Carbonell J, Kantrowitz M (2000) Multi-document summarization by sentence extraction. In: NAACL-ANLP 2000 workshop on automatic summarization. Association for Computational Linguistics, Stroudsburg, pp 40–48

    Chapter  Google Scholar 

  9. Haghighi A, Vanderwende L (2009) Exploring content models for multi-document summarization. In: Proceedings of human language technologies: The 2009 annual conference of the North American Chapter of the Association for Computational Linguistics on ZZZ. Association for Computational Linguistics, Stroudsburg, pp 362–370

    Google Scholar 

  10. Jurafsky D, Martin JH, Kehler A, Vander Linden K, Ward N (2000) Speech and language processing. Prentice Hall, New York

    Google Scholar 

  11. Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Inf Process Lett 70(1):39–45

    Article  MathSciNet  MATH  Google Scholar 

  12. Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N (2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, p 429

    Google Scholar 

  13. Li J, Li L, Li T (2011) MSSF: A multi-document summarization framework based on submodularity. In: Proceedings of SIGIR’11

    Google Scholar 

  14. Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Proceedings of the workshop on text summarization branches out (WAS 2004), pp 25–26

    Google Scholar 

  15. Lin H, Bilmes J (2010) Multi-document summarization via budgeted maximization of submodular functions. In: NAACL/HLT

    Google Scholar 

  16. Mani I (2001) Automatic summarization. Comput Linguist 28(2)

  17. Minoux M (1978) Accelerated greedy algorithms for maximizing submodular set functions. Optim Tech 234–243

  18. Nastase V (2008) Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 763–772

    Chapter  Google Scholar 

  19. Nemhauser GL, Wolsey LA (1981) Maximizing submodular set functions: formulations and analysis of algorithms. Stud Graphs Discrete Program 11:279–301

    Article  MathSciNet  MATH  Google Scholar 

  20. Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions. Math Program 14(1):265–294

    Article  MathSciNet  MATH  Google Scholar 

  21. Radev DR, Jing H, Sty M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manag 40(6):919–938

    Article  MATH  Google Scholar 

  22. Saggion H, Bontcheva K, Cunningham H (2003) Robust generic and query-based summarisation. In: Proceedings of the European chapter of computational linguistics (EACL). Research notes and demos

    Google Scholar 

  23. Steinberger J, Jezek K (2004) Using latent semantic analysis in text summarization and summary evaluation. In: Proc. ISIM04, pp 93–100

    Google Scholar 

  24. Tang J, Yao L, Chen D (2009) Multi-topic based query-oriented summarization. In: Proceedings of SDM

    Google Scholar 

  25. Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: Proceedings of IJCAI, pp 2903–2908

    Google Scholar 

  26. Wan X, Yang J, Xiao J (2007) Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Annual meeting—Association for Computational Linguistics, vol 45, p 552

    Google Scholar 

  27. Wang D, Li T, Zhu S, Ding C (2008) Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 307–314

    Chapter  Google Scholar 

  28. Wang D, Zhu S, Li T, Gong Y (2009) Comparative document summarization via discriminative sentence selection. In: Proceeding of the 18th ACM conference on information and knowledge management. ACM, New York, pp 1963–1966

    Chapter  Google Scholar 

  29. Wei F, Li W, Lu Q, He Y (2008) Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 283–290

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Li, L. & Li, T. Multi-document summarization via submodularity. Appl Intell 37, 420–430 (2012). https://doi.org/10.1007/s10489-012-0336-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-012-0336-1

Keywords

Navigation