Skip to main content
Log in

A New Method for Extractive Text Summarization Using Neural Networks

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Summarization aims at extracting the salient information from a document and presenting the extracted information in a condensed form. Most existing methods for extractive text summarization generate a summary from a document using a two-stage process. In the first stage, the sentences are ranked based on their saliency scores and, in the second stage, the summary generation process starts with the top-ranked sentence and selects the next sentences one by one from the ranked list. To improve summary diversity, a sentence is included in the summary if the sentence is sufficiently dissimilar from the already selected sentences. Sentence selection is continued until the summary of the desired length is reached. The second stage is greedy in nature and it uses a predefined similarity threshold value to check the dissimilarity of a sentence with the already selected sentences. Due to this fixed similarity threshold which is manually tuned, in most cases, this approach fails to manage the diversity in a summary. This article proposes a summarization approach that uses a neural network-based learning model that learns to include a sentence in a summary by taking into account both the saliency of the sentence and the diversity in the summary. For this purpose, the model is trained using two types of features—saliency features and diversity features. We have evaluated the proposed approach using two open benchmark datasets—the DUC dataset and the Daily Mail dataset. Experimental results show that the proposed neural summarization approach is effective in producing better non-redundant informative summaries and outperforms many existing summarization approaches to which it is compared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of Data and Materials

We have used two different datasets-DUC dataset(Document Understanding Conference: http://duc.nist.gov/) and Daily Mail dataset(https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail) which can be accessed by hyperlinks attached as footnote.

Code Availability

The code used to create the proposed model is custom code.

Notes

  1. https://duc.nist.gov/data.html.

  2. https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail.

  3. Document Understanding Conference: http://duc.nist.gov/.

  4. https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail.

References

  1. Alguliyev R, et al. COSUM: text summarization based on clustering and optimization. Expert Syst. 2018;36(4):1–17.

    Google Scholar 

  2. Aliguliyev RM. A new sentence similarity measure and sentence-based extractive technique for automatic text summarization. Expert Syst Appl. 2009;36(4):7764–72.

    Article  Google Scholar 

  3. Aone C, Okurowski ME, Gorlinsky J, Larsen B. A trainable summarizer with knowledge acquired from robust nlp techniques, Advances in Automatic Text Summarization. Cambridge: MIT Press; 1999. p. 71–80.

    Google Scholar 

  4. Banko M, Witbrock MJ. Headline generation based on statistical translation. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. 2000; pp. 318–325.

  5. Barzilay R, Elhadad M. Using lexical chains for text summarization, https://www.aclweb.org/anthology/W97-0703, 1997.

  6. Baxendale PB. Machine-made index for technical literature: an experiment. IBM J Res Dev. 1958;2(4):354–61.

    Article  Google Scholar 

  7. Cao Z, Wei F, Dong L, Li S, Zhou M. Ranking with recursive neural networks and its application to multi-document summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2015;29(1). https://ojs.aaai.org/index.php/AAAI/article/view/9490.

  8. Carbonell J, Goldstein J. The Use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998; pp 335–336.

  9. Cheng J, Lapata M. Neural summarization by extracting sentences and words. Association for Computational Linguistics. 2016; pp 484–494.

  10. Chowdhury SR, Sarkar K, Dam S. An approach to generic bengali text summarization using latent semantic analysis. Bhubaneswar: IEEE ICIT; 2017. p. 11–7.

    Google Scholar 

  11. Dianne PO, Conroy JM. Using HMM and logistic regression to generate extract summaries. Workshop on Text Summarization in conjunction with the ACM SIGIR Conference. 2002.

  12. Di Fabbrizio G, Aker A, Gaizauskas R. Summarizing online reviews using aspect rating distributions and language modeling. IEEE Intell Syst. 2013;28(3):28–37.

    Article  Google Scholar 

  13. Dunlavy DM, O’Leary DP, Conroy JM, Schlesinger JD. QCS A system for querying clustering and summarizing documents. Inform Process Manag. 2001;43(6):1588–605.

    Article  Google Scholar 

  14. Edmundson HP. New methods in automatic extracting. J ACM. 1969;16(2):264–85.

    Article  MATH  Google Scholar 

  15. Erkan G, Radev DR. LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res. 2004;22(1):457–79.

    Article  Google Scholar 

  16. Fuentes M, Alfonseca E, Rodr H. Support vector machines for query-focused summarization trained and evaluated on pyramid data. 2007; pp 57–60.

  17. Hovy E, Mckeown K. Introduction to the special issue on summarization. Comput Linguist. 2002;28.

  18. Hu Y-H, Chen Y-L, Chou H-L. Opinion mining from online hotel reviews—a text summarization approach. Inf Process Manag. 2017;53(2):436–49.

    Article  Google Scholar 

  19. Kaikhah K. Automatic text summarization with neural networks. In: Second IEEE International Conference on Intelligent Systems. 2004; pp 40–44.

  20. Khan R, Qian Y, Naeem S. Extractive based text summarization using K-means and TF-IDF. Int J Inform Eng Electron Bus. 2019;11:33–44.

    Google Scholar 

  21. Lin C, Rey M. ROUGE: a package for automatic evaluation of summaries. Comput Linguist. 2004;11:74–81.

    Google Scholar 

  22. Luhn HP. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):159–65.

    Article  MathSciNet  Google Scholar 

  23. Manevitz LM. One-class SVMs for document classification. IBM J Res Dev. 2001;2:139–54.

    MATH  Google Scholar 

  24. Mendoza M, Bonilla S, Noguera C, Cobos C, León E. Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl. 2014;41(9):4158–69.

    Article  Google Scholar 

  25. Mihalcea R. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL interactive poster and demonstration sessions. 2004; pp 170–173.

  26. Mikael K, Mogren O, Tahmasebi N, Dubhashi D. Extractive summarization using continuous vector space models. In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC). 2014; pp 31–39.

  27. Moen H, et al. On evaluation of automatically generated clinical discharge summaries. CEUR Workshop Proceedings. 2014; pp 101–114.

  28. Moritz K, Tom H, Kay W. Teaching machines to read and comprehend. In: NIPS’15: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015; pp 1–14.

  29. Nallapati R, Zhai F, Zhou B. SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2017;31(1).

  30. Oliveira H, Lins RD, Lima R, Freitas F, Simske SJ. A regression-based approach using integer linear programming for single-document summarization. In: 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI). 2017; pp 270–277.

  31. Pedersen J, Palo X, Alto P. A trainable document summarizer. In: SIGIR ’95: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 1995; pp 68–73.

  32. Pennington J, Socher R, Manning c. GloVe: global vectors for word representation proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014; pp 1532–1543.

  33. Qaroush A, Farha IA, Ghanem W, Washaha M, Maali E. An efficient single document Arabic text summarization using a combination of statistical and semantic features. J King Saud Univ-Comput Inform Sci. 2021;33(6):677–92.

    Google Scholar 

  34. Radev DR, Arbor A. Centroid-based summarization of multiple documents. J Inform Process Manag. 2004;40:919–38.

    Article  MATH  Google Scholar 

  35. Roul R, Sahoo J, Goel R. Deep learning in the domain of multi-document text summarization. In: International Conference on Pattern Recognition and Machine Intelligence. 2017; pp 575–581.

  36. Saha C, Jivani A. An automatic text summarization on naive bayes classifier using latent semantic analysis, data, engineering and applications. Springer, Singapore. 2019; pp 171-180. https://doi.org/10.1007/978-981-13-6347-4_16.

  37. Sarkar K. Automatic keyphrase extraction from medical documents. In: International Conference on Pattern Recognition and Machine Intelligence. 2009; pp 273–278.

  38. Sarkar K, Nasipuri M, Ghose S. Using machine learning for medical document summarization. Int J Database Theory Appl. 2011; pp 31-38.

  39. Sarkar K. Bengali text summarization by sentence extraction. Proceedings of International Conference on Business and Information Management(ICBIM-2012), NIT Durgapur. 2012; pp 233-245.

  40. Sarkar K. Automatic single document text summarization using key concepts in documents. J Inform Process Syst. 2013;9(4):602–20.

    Article  Google Scholar 

  41. Shen D, et al. Document summarization using conditional random fields. In: IJCAI, Proceedings of the 20th international joint conference on Artifical intelligence, 7. 2007; pp 2862–2867.

  42. Song W, Cheon L, Cheol S, Feng X. Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization. Expert Syst Appl. 2011;38(8):9112–21.

    Article  Google Scholar 

  43. Socher R. Recursive deep learning for natural language processing and computer vision, Ph.D thesis, Stanford University 2014.

  44. Svore KM, et al. Enhancing single-document summarization by combining RankNet and third-party sources. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 2007; pp 448–457.

  45. Torres-Moreno J-M, St-Onge P-L, Gagnon M, El-Beze M, Bellot P. Automatic summarization system coupled with a question-answering system (QAAS). 2009. arXiv:0905.2990.

  46. Wan X. Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics, Association for Computational Linguistics. 2010; pp 1137–1145.

  47. Wan X, Yang J. Improved affinity graph based multi-document summarization. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. 2006; pp 181–184.

  48. Wan X, Yang J, Xiao J. Manifold-ranking based topic-focused multi-document summarization. In: IJCAI, Proceedings of the 20th international joint conference on Artificial intelligence, 7, 2007; pp 2903–2908.

  49. Wan X, Xiao J. Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans Inform Syst. 2010;28(2):1–34.

    Article  Google Scholar 

  50. Yousefi-azar M, Hamey L. Text summarization using unsupervised deep learning. Expert Syst Appl. 2017;68:93–105.

    Article  Google Scholar 

  51. Zajic D, Dorr B, Schwartz R. Automatic headline generation for newspaper stories. Workshop on Automatic Summarization. 2002; pp 78–85.

  52. Zhong S, Liu Y, Li B, Long J. Query-oriented unsupervised multi-document summarization via deep learning model. Expert Syst Appl. 2012;42(21):8146–55.

    Article  Google Scholar 

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: KS. Methodology: KS, SRC. Software: SRC. Validation: KS, SRC. Formal analysis: KS, SRC. Investigation: SRC. Resources: SRC. Data curation: SRC. Writing-original draft: KS, SRC. Writing-review and editing: KS, SRC. Visualization: SRC. Supervision: KS. Project administration: KS.

Corresponding author

Correspondence to Kamal Sarkar.

Ethics declarations

Conflict of Interest

The authors have no conflicts of interest.

Ethical Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Research Involving Human Participants and/or Animals

During research no human participants or animals were involved.

Informed Consent

This article does not involve any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chowdhury, S.R., Sarkar, K. A New Method for Extractive Text Summarization Using Neural Networks. SN COMPUT. SCI. 4, 384 (2023). https://doi.org/10.1007/s42979-023-01806-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-01806-0

Keywords

Navigation