A hybrid machine learning model for multi-document summarization

Fattah, Mohamed Abdel

doi:10.1007/s10489-013-0490-0

A hybrid machine learning model for multi-document summarization

Published: 20 December 2013

Volume 40, pages 592–600, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Mohamed Abdel Fattah^1,2

1764 Accesses
1 Altmetric
Explore all metrics

Abstract

This work proposes an approach that uses statistical tools to improve content selection in multi-document automatic text summarization. The method uses a trainable summarizer, which takes into account several features: the similarity of words among sentences, the similarity of words among paragraphs, the text format, cue-phrases, a score related to the frequency of terms in the whole document, the title, sentence location and the occurrence of non-essential information. The effect of each of these sentence features on the summarization task is investigated. These features are then used in combination to construct text summarizer models based on a maximum entropy model, a naive-Bayes classifier, and a support vector machine. To produce the final summary, the three models are combined into a hybrid model that ranks the sentences in order of importance. The performance of this new method has been tested using the DUC 2002 data corpus. The effectiveness of this technique is measured using the ROUGE score, and the results are promising when compared with some existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aone C, Okurowski ME, Gorlinsky J, Larsen B (1997) A scalable summarization system using robust NLP. In: Proceedings of the ACL’97/EACL’97 workshop on intelligent scalable text summarization, Madrid, Spain, pp 10–17
Google Scholar
Azzam S, Humphreys K, Gaizauskas R (1999) Using coreference chains for text summarization. In: Proceedings of the ACL’99, College Park, MD, USA, pp 77–84
Google Scholar
Begum N, Fattah M, Ren F (2009) Automatic text summarization using support vector machine. Int J Innov Comput Inf Control 5(7):1987–1996
Google Scholar
Carbonell JG, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st ACM SIGIR, pp 335–336
Google Scholar
Diaz A, Gervás P (2007) User-model based personalized summarization. Inf Process Manag 43(6):1715–1734
Article Google Scholar
Dorr B, Gaasterland T (2007) Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction. Inf Process Manag 43(6):1681–1704
Article Google Scholar
Edmundson HP (1969) New methods in automatic extracting. J ACM 16(2):264–285
Article MATH Google Scholar
Fattah M, Ren F (2009) GA, MR, FFNN, PNN & GMM based models for automatic text summarization. Comput Speech Lang 23(1):126–144
Article Google Scholar
Goldstein J, Kantrowitz M, Mittal V, Carbonell J (1999) Summarizing text documents: sentence selection and evaluation metrics. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’99), Berkeley, CA, USA, pp 121–128
Chapter Google Scholar
Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’01), New Orleans, LA, USA, pp 19–25
Chapter Google Scholar
Hahn U, Mani I (2000) The challenges of automatic summarization. IEEE Comput 33(11):29–36
Article Google Scholar
Harabagiu S, Hickl A, Lacatusu F (2007) Satisfying information needs with multi-document summaries. Inf Process Manag 43(6):1619–1642
Article Google Scholar
Hirao T, Okumura M, Yasuda N, Isozaki H (2007) Supervised automatic evaluation for summarization with voted regression model. Inf Process Manag 43(6):1521–1535
Article Google Scholar
Hobson S, Dorr B, Monz C, Schwartz R (2007) Task-based evaluation of text summarization using relevance prediction. Inf Process Manag 43(6):1482–1499
Article Google Scholar
Hovy E, Lin CY (1997) Automatic text summarization in SUMMARIST. In: Proceedings of the ACL’97/EACL’97 workshop on intelligent scalable text summarization, Madrid, Spain, pp 18–24
Google Scholar
Kupiec J, Pedersen J, Chen F (1995) A trainable document summarizer. In: Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’95), Seattle, WA, USA, pp 68–73
Chapter Google Scholar
Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B (2007) Generating gene summaries from biomedical literature: a study of semi-structured summarization. Inf Process Manag 43(6):1777–1791
Article Google Scholar
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Article MathSciNet Google Scholar
Mani I, Bloedorn E (1999) Summarizing similarities and differences among related documents. Inf Retr 1(1–2):35–67
Article Google Scholar
Mani I, Maybury MT (eds) (1999) Advances in automated text summarization. MIT Press, Cambridge
Google Scholar
McKeown K, Radev DR (1995) Generating summaries of multiple news articles. In: Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’95), Seattle, WA, USA, pp 74–82
Chapter Google Scholar
Moens M (2007) Summarizing court decisions. Inf Process Manag 43(6):1748–1764
Article MathSciNet Google Scholar
Nomoto T (2007) Discriminative sentence compression with conditional random fields. Inf Process Manag 43(6):1571–1587
Article Google Scholar
Nomoto T, Matsumoto Y (2001) A new approach to unsupervised text summarization. In: Proceedings of the 24th ACM SIGIR, pp 26–34
Google Scholar
Nigam K, Lafferty J, Mc-Callum A (1999) Using maximum entropy for text classification. In: IJCAI-99 workshop on machine learning for information filtering
Google Scholar
Jaynes ET (1957) Information theory and statistical mechanics. Phys Rev 106(4):620–630
Article MATH MathSciNet Google Scholar
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
Article MATH MathSciNet Google Scholar
Over P, Dang H, Harman D (2007) DUC in context. Inf Process Manag 43(6):1506–1520
Article Google Scholar
Reeve L, Han H, Brooks A (2007) The use of domain-specific concepts in biomedical text summarization. Inf Process Manag 43(6):1765–1776
Article Google Scholar
Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33(2):193–207
Article Google Scholar
Schank R, Abelson R (1977) In: Scripts, plans, goals, and understanding. Lawrence Erlbaum Associates, Hillsdale
Google Scholar
Sohrab M, Fattah M, Ren F (2008) The best feature parameter and HMM for text summarization. In: Research in computing science, and CORE-2008, 9th conference on computing, vol 34, pp 153–161
Google Scholar
Sjöbergh J (2007) Older versions of the ROUGEeval summarization evaluation system were easier to fool. Inf Process Manag 43(6):1500–1505
Article Google Scholar
Sparck Jones K (1993) Discourse modeling for automatic summarizing. Technical report 29D, Computer laboratory, University of Cambridge
Steinberger J, Poesio M, Kabadjov M, Ježek K (2007) Two uses of anaphora resolution in summarization. Inf Process Manag 43(6):1663–1680
Article Google Scholar
Teufel SH, Moens M (1997) Sentence extraction as a classification task. In: Proceedings of the ACL’97/EACL’97 workshop on intelligent scalable text summarization, Madrid, Spain, pp 58–65
Google Scholar
Wan X, Yang J (2006) Improved affinity graph based multi-document summarization. In: Proceedings of the human language technology conference of the North American chapter of the ACL, pp 181–184
Google Scholar
Yeh J, Ke H, Yang W, Meng I (2005) Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manag 41(1):75–95
Article Google Scholar
Yeh JY, Ke HR, Yang WP (2002) Chinese text summarization using a trainable summarizer and latent semantic analysis. In: Proceedings of the 5th international conference on Asian digital libraries (ICADL’02), Singapore. Lecture notes in computer science, vol 2555. Springer, Berlin, pp 76–87
Google Scholar
Ye S, Chua T, Kan M, Qiu L (2007) Document concept lattice for text understanding and summarization. Inf Process Manag 43(6):1643–1662
Article Google Scholar
Young SR, Hayes PJ (1985) Automatic classification and summarization of banking telexes. In: Proceedings of the 2nd conference on artificial intelligence application, pp 402–408
Google Scholar
Zajic D, Dorr B, Lin J, Schwartz R (2007) Multi-candidate reduction: sentence compression as a tool for document summarization tasks. Inf Process Manag 43(6):1549–1570
Article Google Scholar
Wan X (2010) Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics, Beijing, China, pp 1137–1145
Google Scholar
Brassard G, Bratley P (1996) Fundamentals of algorithms. Prentice hall, New Jersey
Google Scholar
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271
Article MATH MathSciNet Google Scholar
Conroy J, Schlesinger J, Kubina J (2011) CLASSY 2011 at TAC: guided and multi-lingual summaries and evaluation metrics. In: Proceedings of the fourth text analysis conference (TAC 2011). National Institute of Standards and Technology, Gaithersburg
Google Scholar
Schlesinger J, Leary D, Conroy J (2008) Arabic/English multidocument summarization with CLASSY—the past and the future. In: Gelbukh AF (ed) CICLing, Haifa, Israel, February 2008. Lecture notes in computer science, vol 4919. Springer, Berlin, pp 568–581
Google Scholar
Li J, Li L, Li T (2012) Multi-document summarization via submodularity. Appl Intell 37(3):420–430
Article Google Scholar

Download references

Acknowledgement

This work is supported by the Deanship of Scientific Research, Taibah University, KSA.

Author information

Authors and Affiliations

Department of Computer Sciences, CCSE Taibah University, KSA, Almadina Almonawara, Saudi Arabia
Mohamed Abdel Fattah
Department of Electronics Technology, FIE Helwan University, Cairo, Egypt
Mohamed Abdel Fattah

Authors

Mohamed Abdel Fattah
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mohamed Abdel Fattah.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fattah, M.A. A hybrid machine learning model for multi-document summarization. Appl Intell 40, 592–600 (2014). https://doi.org/10.1007/s10489-013-0490-0

Download citation

Published: 20 December 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10489-013-0490-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid machine learning model for multi-document summarization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Automatic Text Summarization on Naive Bayes Classifier Using Latent Semantic Analysis

A Classification-Based Summarization Model Using Supervised Learning

Sumdoc: A Unified Approach for Automatic Text Summarization

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A hybrid machine learning model for multi-document summarization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Automatic Text Summarization on Naive Bayes Classifier Using Latent Semantic Analysis

A Classification-Based Summarization Model Using Supervised Learning

Sumdoc: A Unified Approach for Automatic Text Summarization

Explore related subjects

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now