A New Method for Extractive Text Summarization Using Neural Networks

Chowdhury, Sohini Roy; Sarkar, Kamal

doi:10.1007/s42979-023-01806-0

A New Method for Extractive Text Summarization Using Neural Networks

Original Research
Published: 09 May 2023

Volume 4, article number 384, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

203 Accesses
3 Citations
Explore all metrics

Abstract

Summarization aims at extracting the salient information from a document and presenting the extracted information in a condensed form. Most existing methods for extractive text summarization generate a summary from a document using a two-stage process. In the first stage, the sentences are ranked based on their saliency scores and, in the second stage, the summary generation process starts with the top-ranked sentence and selects the next sentences one by one from the ranked list. To improve summary diversity, a sentence is included in the summary if the sentence is sufficiently dissimilar from the already selected sentences. Sentence selection is continued until the summary of the desired length is reached. The second stage is greedy in nature and it uses a predefined similarity threshold value to check the dissimilarity of a sentence with the already selected sentences. Due to this fixed similarity threshold which is manually tuned, in most cases, this approach fails to manage the diversity in a summary. This article proposes a summarization approach that uses a neural network-based learning model that learns to include a sentence in a summary by taking into account both the saliency of the sentence and the diversity in the summary. For this purpose, the model is trained using two types of features—saliency features and diversity features. We have evaluated the proposed approach using two open benchmark datasets—the DUC dataset and the Daily Mail dataset. Experimental results show that the proposed neural summarization approach is effective in producing better non-redundant informative summaries and outperforms many existing summarization approaches to which it is compared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

TextConvoNet: a convolutional neural network based architecture for text classification

Article 22 October 2022

A Review on Word Embedding Techniques for Text Classification

Availability of Data and Materials

We have used two different datasets-DUC dataset(Document Understanding Conference: http://duc.nist.gov/) and Daily Mail dataset(https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail) which can be accessed by hyperlinks attached as footnote.

Code Availability

The code used to create the proposed model is custom code.

Notes

https://duc.nist.gov/data.html.
https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail.
Document Understanding Conference: http://duc.nist.gov/.
https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail.

References

Alguliyev R, et al. COSUM: text summarization based on clustering and optimization. Expert Syst. 2018;36(4):1–17.
Google Scholar
Aliguliyev RM. A new sentence similarity measure and sentence-based extractive technique for automatic text summarization. Expert Syst Appl. 2009;36(4):7764–72.
Article Google Scholar
Aone C, Okurowski ME, Gorlinsky J, Larsen B. A trainable summarizer with knowledge acquired from robust nlp techniques, Advances in Automatic Text Summarization. Cambridge: MIT Press; 1999. p. 71–80.
Google Scholar
Banko M, Witbrock MJ. Headline generation based on statistical translation. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. 2000; pp. 318–325.
Barzilay R, Elhadad M. Using lexical chains for text summarization, https://www.aclweb.org/anthology/W97-0703, 1997.
Baxendale PB. Machine-made index for technical literature: an experiment. IBM J Res Dev. 1958;2(4):354–61.
Article Google Scholar
Cao Z, Wei F, Dong L, Li S, Zhou M. Ranking with recursive neural networks and its application to multi-document summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2015;29(1). https://ojs.aaai.org/index.php/AAAI/article/view/9490.
Carbonell J, Goldstein J. The Use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998; pp 335–336.
Cheng J, Lapata M. Neural summarization by extracting sentences and words. Association for Computational Linguistics. 2016; pp 484–494.
Chowdhury SR, Sarkar K, Dam S. An approach to generic bengali text summarization using latent semantic analysis. Bhubaneswar: IEEE ICIT; 2017. p. 11–7.
Google Scholar
Dianne PO, Conroy JM. Using HMM and logistic regression to generate extract summaries. Workshop on Text Summarization in conjunction with the ACM SIGIR Conference. 2002.
Di Fabbrizio G, Aker A, Gaizauskas R. Summarizing online reviews using aspect rating distributions and language modeling. IEEE Intell Syst. 2013;28(3):28–37.
Article Google Scholar
Dunlavy DM, O’Leary DP, Conroy JM, Schlesinger JD. QCS A system for querying clustering and summarizing documents. Inform Process Manag. 2001;43(6):1588–605.
Article Google Scholar
Edmundson HP. New methods in automatic extracting. J ACM. 1969;16(2):264–85.
Article MATH Google Scholar
Erkan G, Radev DR. LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res. 2004;22(1):457–79.
Article Google Scholar
Fuentes M, Alfonseca E, Rodr H. Support vector machines for query-focused summarization trained and evaluated on pyramid data. 2007; pp 57–60.
Hovy E, Mckeown K. Introduction to the special issue on summarization. Comput Linguist. 2002;28.
Hu Y-H, Chen Y-L, Chou H-L. Opinion mining from online hotel reviews—a text summarization approach. Inf Process Manag. 2017;53(2):436–49.
Article Google Scholar
Kaikhah K. Automatic text summarization with neural networks. In: Second IEEE International Conference on Intelligent Systems. 2004; pp 40–44.
Khan R, Qian Y, Naeem S. Extractive based text summarization using K-means and TF-IDF. Int J Inform Eng Electron Bus. 2019;11:33–44.
Google Scholar
Lin C, Rey M. ROUGE: a package for automatic evaluation of summaries. Comput Linguist. 2004;11:74–81.
Google Scholar
Luhn HP. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):159–65.
Article MathSciNet Google Scholar
Manevitz LM. One-class SVMs for document classification. IBM J Res Dev. 2001;2:139–54.
MATH Google Scholar
Mendoza M, Bonilla S, Noguera C, Cobos C, León E. Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl. 2014;41(9):4158–69.
Article Google Scholar
Mihalcea R. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL interactive poster and demonstration sessions. 2004; pp 170–173.
Mikael K, Mogren O, Tahmasebi N, Dubhashi D. Extractive summarization using continuous vector space models. In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC). 2014; pp 31–39.
Moen H, et al. On evaluation of automatically generated clinical discharge summaries. CEUR Workshop Proceedings. 2014; pp 101–114.
Moritz K, Tom H, Kay W. Teaching machines to read and comprehend. In: NIPS’15: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015; pp 1–14.
Nallapati R, Zhai F, Zhou B. SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2017;31(1).
Oliveira H, Lins RD, Lima R, Freitas F, Simske SJ. A regression-based approach using integer linear programming for single-document summarization. In: 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI). 2017; pp 270–277.
Pedersen J, Palo X, Alto P. A trainable document summarizer. In: SIGIR ’95: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 1995; pp 68–73.
Pennington J, Socher R, Manning c. GloVe: global vectors for word representation proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014; pp 1532–1543.
Qaroush A, Farha IA, Ghanem W, Washaha M, Maali E. An efficient single document Arabic text summarization using a combination of statistical and semantic features. J King Saud Univ-Comput Inform Sci. 2021;33(6):677–92.
Google Scholar
Radev DR, Arbor A. Centroid-based summarization of multiple documents. J Inform Process Manag. 2004;40:919–38.
Article MATH Google Scholar
Roul R, Sahoo J, Goel R. Deep learning in the domain of multi-document text summarization. In: International Conference on Pattern Recognition and Machine Intelligence. 2017; pp 575–581.
Saha C, Jivani A. An automatic text summarization on naive bayes classifier using latent semantic analysis, data, engineering and applications. Springer, Singapore. 2019; pp 171-180. https://doi.org/10.1007/978-981-13-6347-4_16.
Sarkar K. Automatic keyphrase extraction from medical documents. In: International Conference on Pattern Recognition and Machine Intelligence. 2009; pp 273–278.
Sarkar K, Nasipuri M, Ghose S. Using machine learning for medical document summarization. Int J Database Theory Appl. 2011; pp 31-38.
Sarkar K. Bengali text summarization by sentence extraction. Proceedings of International Conference on Business and Information Management(ICBIM-2012), NIT Durgapur. 2012; pp 233-245.
Sarkar K. Automatic single document text summarization using key concepts in documents. J Inform Process Syst. 2013;9(4):602–20.
Article Google Scholar
Shen D, et al. Document summarization using conditional random fields. In: IJCAI, Proceedings of the 20th international joint conference on Artifical intelligence, 7. 2007; pp 2862–2867.
Song W, Cheon L, Cheol S, Feng X. Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization. Expert Syst Appl. 2011;38(8):9112–21.
Article Google Scholar
Socher R. Recursive deep learning for natural language processing and computer vision, Ph.D thesis, Stanford University 2014.
Svore KM, et al. Enhancing single-document summarization by combining RankNet and third-party sources. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 2007; pp 448–457.
Torres-Moreno J-M, St-Onge P-L, Gagnon M, El-Beze M, Bellot P. Automatic summarization system coupled with a question-answering system (QAAS). 2009. arXiv:0905.2990.
Wan X. Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics, Association for Computational Linguistics. 2010; pp 1137–1145.
Wan X, Yang J. Improved affinity graph based multi-document summarization. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. 2006; pp 181–184.
Wan X, Yang J, Xiao J. Manifold-ranking based topic-focused multi-document summarization. In: IJCAI, Proceedings of the 20th international joint conference on Artificial intelligence, 7, 2007; pp 2903–2908.
Wan X, Xiao J. Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans Inform Syst. 2010;28(2):1–34.
Article Google Scholar
Yousefi-azar M, Hamey L. Text summarization using unsupervised deep learning. Expert Syst Appl. 2017;68:93–105.
Article Google Scholar
Zajic D, Dorr B, Schwartz R. Automatic headline generation for newspaper stories. Workshop on Automatic Summarization. 2002; pp 78–85.
Zhong S, Liu Y, Li B, Long J. Query-oriented unsupervised multi-document summarization via deep learning model. Expert Syst Appl. 2012;42(21):8146–55.
Article Google Scholar

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Sohini Roy Chowdhury and Kamal Sarkar have contributed equally to this work.

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, 188, Raja S.C. Mallick Rd, Kolkata, West Bengal, 700032, India
Sohini Roy Chowdhury & Kamal Sarkar

Authors

Sohini Roy Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Kamal Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: KS. Methodology: KS, SRC. Software: SRC. Validation: KS, SRC. Formal analysis: KS, SRC. Investigation: SRC. Resources: SRC. Data curation: SRC. Writing-original draft: KS, SRC. Writing-review and editing: KS, SRC. Visualization: SRC. Supervision: KS. Project administration: KS.

Corresponding author

Correspondence to Kamal Sarkar.

Ethics declarations

Conflict of Interest

The authors have no conflicts of interest.

Ethical Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Research Involving Human Participants and/or Animals

During research no human participants or animals were involved.

Informed Consent

This article does not involve any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chowdhury, S.R., Sarkar, K. A New Method for Extractive Text Summarization Using Neural Networks. SN COMPUT. SCI. 4, 384 (2023). https://doi.org/10.1007/s42979-023-01806-0

Download citation

Received: 22 June 2022
Accepted: 22 March 2023
Published: 09 May 2023
DOI: https://doi.org/10.1007/s42979-023-01806-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Method for Extractive Text Summarization Using Neural Networks

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

TextConvoNet: a convolutional neural network based architecture for text classification

A Review on Word Embedding Techniques for Text Classification

Availability of Data and Materials

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Consent to Participate

Consent for Publication

Research Involving Human Participants and/or Animals

Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A New Method for Extractive Text Summarization Using Neural Networks

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

TextConvoNet: a convolutional neural network based architecture for text classification

A Review on Word Embedding Techniques for Text Classification

Availability of Data and Materials

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Consent to Participate

Consent for Publication

Research Involving Human Participants and/or Animals

Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation