Skip to main content
Log in

Deep learning-based extractive text summarization with word-level attention mechanism

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the rise in the amount of textual data over the internet, the demand for summarizing it in a short, readable, easy-to-understand form has increased. Much of the research is being carried out to improve the efficiency of these text summarization systems. In the past, extractive text summarization was mainly carried out through human-crafted features which were unable to learn the semantic information from the text. Therefore, in an attempt to improve the quality of summary, we have designed a neural network-based completely data-driven model for extractive single-document summarization of text which we have termed as WL-AttenSumm. Our proposed model implements a Word-level Attention mechanism that focuses more on the important parts in the input sequence so relevant semantic features are captured at the word-level that helps in selecting significant sentences for the summary. Another advantage of this model is that it can extract syntactic and semantic relationships from the text by using a Convolutional Bi-GRU (Bi-directional Gated Recurrent Unit) network. We have trained our proposed model on the combined CNN/Daily Mail corpus and evaluated on the Daily Mail, combined CNN/Daily Mail, and DUC 2002 test dataset for single document summarization and obtained better results as compared to the state-of-the-art baseline approaches in terms of ROUGE metrics. For the summary length limited to 75 words, our attention-based approach generates ROUGE recall scores for R-1, R-2, R-L measures as 32.8%, 11.0%, 27.5% with Daily Mail corpus and 55.9%, 24.8%, 53.9% with DUC 2002 dataset, respectively. Experiments performed with the joint CNN/Daily dataset yield full-length ROUGE F1 scores as 42.9%, 19.7%, 39.3%. Therefore, our deep learning-based summarization framework achieves competitive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Alguliyev RM, Aliguliyev RM, Isazade NR, Abdi A, Idris N (2019) COSUM: text summarization based on clustering and optimization. Expert Syst 36(1):e12340. https://doi.org/10.1111/exsy.12340

    Article  Google Scholar 

  2. Al-Sabahi K, Zuping Z, Nadher M (2018) A hierarchical structured self-attentive model for extractive document summarization (HSSAS). IEEE Access 6:24205–24212

    Article  Google Scholar 

  3. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In 3rd international conference on learning representations (ICLR 2015), California

  4. Bansal N, Sharma A, Singh RK (2019) Fuzzy AHP approach for legal judgement summarization. J Manag Anal 6(3):323–340

    Google Scholar 

  5. Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GraphSum: discovering correlations among multiple terms for graph-based summarization. Inf Sci 249:96–109. https://doi.org/10.1016/j.ins.2013.06.046

    Article  MathSciNet  Google Scholar 

  6. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166. https://doi.org/10.1109/72.279181

    Article  Google Scholar 

  7. Bi K, Jha R, Croft WB, Celikyilmaz A (2020) AREDSUM: adaptive redundancy-aware iterative sentence ranking for extractive document summarization. arXiv preprint arXiv:2004.06176

  8. Cao Z, Li W, Li S, Wei F, Li Y (2016) Attsum: joint learning of focusing and summarization with neural attention. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, pp 547–556

  9. Cao Z, Wei F, Li S, Li W, Zhou M, Wang H (2015) Learning summary prior representation for extractive summarization. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 2: short papers), ACL, Beijing, pp 829–833. https://doi.org/10.3115/v1/P15-2136

  10. Carbonell J, Goldstein J (1998) The use of MMR, diversity-based Reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, ACM, Melbourne, pp 335–336

  11. Cheng J, Lapata M (2016) Neural summarization by extracting sentences and words. In proceedings of the 54th annual meeting of the Association for Computational Linguistics (volume 1: Long papers), ACL, Berlin, pp 484-494. https://doi.org/10.18653/v1/P16-1046

  12. Cho K, van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder-decoder approaches. In Proceedings of SSST-8, eighth workshop on syntax, Semantics and Structure in Statistical Translation, ACL, Doha, pp 103–111

  13. Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), ACL, Doha, Qatar, pp 1724–1734. https://doi.org/10.3115/v1/D14-1179

  14. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    MATH  Google Scholar 

  15. Diao Y, Lin H, Yang L, Fan X, Chu Y, Wu D, Zhang D, Xu K (2020) CRHASum: extractive text summarization with contextualized-representation hierarchical-attention summarization network. Neural Comput Applic 32(15):11491–11503

    Article  Google Scholar 

  16. Dong Y, Shen Y, Crawford E, van Hoof H, Cheung JC (2018) Banditsum: extractive summarization as a contextual bandit. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, ACL, Brussels, Belgium, pp 3739–3748. https://doi.org/10.18653/v1/D18-1409

  17. Du J, Xu R, He Y, Gui L (2017) Stance classification with target-specific neural attention networks. In proceedings of the twenty-sixth international joint conference on artificial intelligence (IJCAI-17), Melbourne, pp 3988-3994

  18. Edmundson HP (1969) New methods in automatic extracting. J ACM (JACM) 16(2):264–285. https://doi.org/10.1145/321510.321519

    Article  MATH  Google Scholar 

  19. El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2020) EdgeSumm: graph-based framework for automatic text summarization. Inf Process Manag 57(6):102264. https://doi.org/10.1016/j.ipm.2020.102264

    Article  Google Scholar 

  20. Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479

    Article  Google Scholar 

  21. Fan A, Lewis M, Dauphin Y (2018) Hierarchical neural story generation. In Proceedings of the 56th annual meeting of the Association for Computational Linguistics (volume 1: Long papers), ACL, Melbourne, pp 889–898

  22. Fang C, Mu D, Deng Z, Wu Z (2017) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:189–195

    Article  Google Scholar 

  23. Fattah MA, Ren F (2009) GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang 23(1):126–144. https://doi.org/10.1016/j.csl.2008.04.002

    Article  Google Scholar 

  24. Feng C, Cai F, Chen H, Rijke M (2018) Attentive encoder-based extractive text summarization. In proceedings of the 27th ACM international conference on information and knowledge management (CIKM’18), New York, pp 1499–1502

  25. Gambhir M, Gupta V (2016) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66. https://doi.org/10.1007/s10462-016-9475-9

    Article  Google Scholar 

  26. Gui L, Hu J, He Y, Xu R, Lu Q, Du J (2017) A question answering approach to emotion cause extraction. In Proceedings of the 2017 Conference on empirical methods in natural language processing, ACL, Denmark, pp 1593–1602

  27. Gupta V, Kaur N (2015) A novel hybrid text summarization system for Punjabi text. Cogn Comput 8(2):261–277

    Article  Google Scholar 

  28. Hark C, Karcı A (2020) Karcı summarization: a simple and effective approach for automatic text summarization using Karcı entropy. Inf Process Manag 57(3):102187. https://doi.org/10.1016/j.ipm.2019.102187

    Article  Google Scholar 

  29. Hermann KM, Kočiský T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. In Proceedings of the 28th international conference on neural information processing systems - volume 1 (NIPS'15), MIT Press, Cambridge, MA, USA, pp 1693–1701

  30. Kim Y (2014) Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), ACL, Qatar, pp 1746–1751

  31. Kim J, Kong D, Lee JH (2018) Self-attention-based message-relevant response generation for neural conversation model. arXiv preprint arXiv:1805.08983

  32. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In 3rd international conference on learning representations (ICLR 2015), California

  33. Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In Text summarization branches out, ACL, Barcelona, pp 74–81

  34. Liu Y, Titov I, Lapata M (2019) Single document summarization as tree induction. In proceedings of the 2019 conference of the north American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (Long and short papers), Minneapolis, Minnesota, pp 1745–1755. https://doi.org/10.18653/v1/N19-1173

  35. Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. In proceedings of the 30th international conference on neural information processing systems (NIPS'16), Barcelona, pp 289–297

  36. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165. https://doi.org/10.1147/rd.22.0159

    Article  MathSciNet  Google Scholar 

  37. Luo L, Ao X, Song Y, Pan F, Yang M, He Q (2019) Reading like HER: human reading inspired extractive summarization. In proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Hong Kong, China, pp 3024–3034. https://doi.org/10.18653/v1/D19-1300

  38. Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on empirical methods in natural language processing (EMNLP), Lisbon, Portugal, pp 1412–1421

  39. McDonald R (2007) A study of global inference algorithms in multi-document summarization. In European Conference on Information Retrieval, Springer, Berlin, Heidelberg, pp 557–564

  40. Meena YK, Gopalani D (2016) Efficient voting-based extractive automatic text summarization using prominent feature set. IETE J Res 62(5):581–590

    Article  Google Scholar 

  41. Mehta P, Majumder P (2018) Effective aggregation of various summarization techniques. Inf Process Manag 54(2):145–158. https://doi.org/10.1016/j.ipm.2017.11.002

    Article  Google Scholar 

  42. Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41(9):4158–4169

    Article  Google Scholar 

  43. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th international conference on neural information processing systems - volume 2 (NIPS'13), Curran Associates Inc., New York, pp 3111–3119

  44. Mohamed M, Oussalah M (2019) SRL-ESA-TextSum: a text summarization approach based on semantic role labeling and explicit semantic analysis. Inf Process Manag 56(4):1356–1372. https://doi.org/10.1016/j.ipm.2019.04.003

    Article  Google Scholar 

  45. Nallapati R, Zhai F, Zhou B (2017) Summarunner: a recurrent neural network based sequence model for extractive summarization of documents. In Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI'17), AAAI Press, California, pp 3075–3081

  46. Nallapati R, Zhou B, Ma M (2016) Classify or select: neural architectures for extractive document summarization. CoRR-A Computing Research Repository, arXiv preprint arXiv:1611.04244

  47. Narayan S, Cohen SB, Lapata M (2018) Ranking sentences for extractive summarization with reinforcement learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), ACL, New Orleans, Louisiana, pp 1747–1759. https://doi.org/10.18653/v1/N18-1158

  48. Nenkova A, McKeown K (2011) Automatic summarization. Foundations and Trends® in Information Retrieval 5(2–3):103–233. https://doi.org/10.1561/1500000015

    Article  Google Scholar 

  49. Nenkova A, Vanderwende L (2005) The impact of frequency on summarization. Microsoft Research, Redmond, Washington, vol 101

  50. Parveen D, Mesgar M, Strube M (2016) Generating coherent summaries of scientific articles using coherence patterns. In proceedings of the 2016 conference on empirical methods in natural language processing, ACL, Austin, Texas, pp 772–783. https://doi.org/10.18653/v1/D16-1074

  51. Parveen D, Ramsl HM, Strube M (2015) Topical coherence for graph-based extractive summarization. In Proceedings of the 2015 Conference on empirical methods in natural language processing (EMNLP), ACL, Lisbon, Portugal, pp 1949–1954

  52. Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), ACL, Qatar, pp 1532–1543

  53. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  Google Scholar 

  54. Saini N, Saha S, Jangra A, Bhattacharyya P (2019) Extractive single document summarization using multi-objective optimization: exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowl-Based Syst 164:45–67. https://doi.org/10.1016/j.knosys.2018.10.021

    Article  Google Scholar 

  55. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  56. Shao L, Zhang H, Wang J (2017) Robust single-document summarizations and a semantic measurement of quality. In international joint conference on knowledge discovery, knowledge engineering, and knowledge management, springer, pp 118–138. https://doi.org/10.1007/978-3-030-15640-4_7

  57. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) DiSAN: directional self-attention network for RNN/CNN-free language understanding. In The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 32(1) 5446–5455

  58. Singh AK, Gupta M, Varma V (2017) Hybrid MemNet for extractive summarization. In Proceedings of the 2017 ACM on conference on information and knowledge management (CIKM’17), Singapore, pp 2303–2306

  59. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In proceedings of the 27th international conference on neural information processing systems-volume 2 (NIPS'14), Montreal, pp 3104–3112

  60. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In 31st conference on neural information processing systems (NIPS 2017), California, pp 5998–6008

  61. Wan X (2010) Towards a unified approach to simultaneous single-document and multi-document summarizations. In Proceedings of the 23rd international conference on computational linguistics (Coling 2010), ACL, Beijing, pp 1137–1145

  62. Wang D, Liu P, Zheng Y, Qiu X, Huang X (2020) Heterogeneous graph neural networks for extractive document summarization. In proceedings of the 58th annual meeting of the Association for Computational Linguistics, pp 6209–6219

  63. Woodsend K, Lapata M (2010) Automatic generation of story highlights. In proceedings of the 48th annual meeting of the Association for Computational Linguistics, Sweden, pp 565–574

  64. Xu K, Ba JL, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In proceedings of the 32nd international conference on international conference on machine learning - volume 37 (ICML'15), France, pp 2048–2057

  65. Xu J, Durrett G (2019) Neural extractive text summarization with syntactic compression. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, ACL, Hong Kong, China, pp 3292–3303

  66. Yang K, He H, Al-Sabahi K, Zhang Z (2019) EcForest: extractive document summarization through enhanced sentence embedding and cascade forest. Concurr Comput Pract Exp 31(17):e5206. https://doi.org/10.1002/cpe.5206

    Article  Google Scholar 

  67. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the north American chapter of the Association for Computational Linguistics: human language technologies, California, pp 1480–1489

  68. Yao J, Wan X, Xiao J (2017) Recent advances in document summarization. Knowl Inf Syst 53(2):297–336. https://doi.org/10.1007/s10115-017-1042-4

    Article  Google Scholar 

  69. Yao K, Zhang L, Luo T, Wu Y (2018) Deep reinforcement learning for extractive document summarization. Neurocomputing 284:52–62

    Article  Google Scholar 

  70. Yasunaga M, Zhang R, Meelu K, Pareek A, Srinivasan K, Radev D (2017) Graph-based neural multi-document summarization. In proceedings of the 21st conference on computational natural language learning (CoNLL 2017), ACL, Vancouver, Canada, pp 452–462. https://doi.org/10.18653/v1/K17-1045

  71. Yin W, Pei Y (2015) Optimizing sentence modeling and selection for document summarization. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI'15), AAAI Press, Argentina, pp 1383–1389

  72. Zhang X, Lapata M, Wei F, Zhou M (2018) Neural latent extractive document summarization. In proceedings of the 2018 conference on empirical methods in natural language processing, ACL, Brussels, Belgium, pp 779–784. https://doi.org/10.18653/v1/D18-1088

  73. Zhong M, Liu P, Wang D, Qiu X, Huang X (2019) Searching for effective neural extractive summarization: what works and What's next. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp 1049–1058. https://doi.org/10.18653/v1/P19-1100

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vishal Gupta.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gambhir, M., Gupta, V. Deep learning-based extractive text summarization with word-level attention mechanism. Multimed Tools Appl 81, 20829–20852 (2022). https://doi.org/10.1007/s11042-022-12729-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12729-y

Keywords