Skip to main content

A Novel Modified Harmonic Mean Combined with Cohesion Score for Multi-document Summarization

  • Conference paper
  • First Online:
Distributed Computing and Intelligent Technology (ICDCIT 2022)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13145))

  • 648 Accesses

Abstract

The abundance of textual information that is generated on a daily basis on the web, social media, and other repositories makes it critical and difficult to extract important information from a large corpus. Automatic Text Summarization (ATS) works well in this direction, which can review many documents and pull out the relevant information from them. But the computational bottlenecks associated with ATS need to be removed by finding efficient workarounds. Although existing research works have focused on this direction for further improvements, there are still many limitations and challenges which need to be addressed. The current work proposes a semantic-based word similarity combined with sentence similarity to summarize a corpus of text documents. Finally, a relative entropy-based technique using KL-divergence is proposed, which arranges the sentences in the final summary as per their importance. Experimental results on DUC datasets are promising and show the potential of the proposed technique compared to the other state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://en.wikipedia.org/wiki/Brown_Corpus.

  2. 2.

    https://github.com/alvations/pywsd.

  3. 3.

    decided by the experiment.

  4. 4.

    http://www.nltk.org/.

  5. 5.

    http://www.duc.nist.gov.

References

  1. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  2. Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2016). https://doi.org/10.1007/s10462-016-9475-9

    Article  Google Scholar 

  3. Roul, R.K., Arora, K.: A nifty review to text summarization-based recommendation system for electronic products. Soft. Comput. 23(24), 13183–13204 (2019). https://doi.org/10.1007/s00500-019-03861-3

    Article  Google Scholar 

  4. Wang, L., Yao, J., Tao, Y., Zhong, L., Liu, W., Du, Q.: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 4453–4460 (2018)

    Google Scholar 

  5. Roul, R.K., Sahoo, J.K., Goel, R.: Deep learning in the domain of multi-document text summarization. In: Shankar, B.U., Ghosh, K., Mandal, D.P., Ray, S.S., Zhang, D., Pal, S.K. (eds.) PReMI 2017. LNCS, vol. 10597, pp. 575–581. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69900-4_73

    Chapter  Google Scholar 

  6. Roul, R.K., Joshi, P.M., Sahoo, J.K.: Abstractive text summarization using enhanced attention model. In: Tiwary, U.S., Chaudhury, S. (eds.) IHCI 2019. LNCS, vol. 11886, pp. 63–76. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44689-5_6

    Chapter  Google Scholar 

  7. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  8. Elbarougy, R., Behery, G., Khatib, A.E.: Graph-based extractive Arabic text summarization using multiple morphological analyzers. J. Inf. Sci. Eng. 36(2), 347–367 (2020)

    Google Scholar 

  9. Wang, D., Zhu, S., Li, T., Chi, Y., Gong, Y.: Integrating document clustering and multidocument summarization. ACM Trans. Knowl. Discov. Data (TKDD) 5(3), 1–26 (2011)

    Article  Google Scholar 

  10. Abdi, A., Shamsuddin, S.M., Hasan, S., Piran, J.: Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment. Expert Syst. Appl. 109, 66–85 (2018)

    Article  Google Scholar 

  11. Suanmali, L., Salim, N., Binwahlan, M.S.: Feature-based sentence extraction using fuzzy inference rules. In: 2009 International Conference on Signal Processing Systems, pp. 511–515. IEEE (2009)

    Google Scholar 

  12. Roul, R.K.: Topic modeling combined with classification technique for extractive multi-document text summarization. Soft. Comput. 25(2), 1113–1127 (2020). https://doi.org/10.1007/s00500-020-05207-w

    Article  Google Scholar 

  13. Roul, R.K., Mehrotra, S., Pungaliya, Y., Sahoo, J.K.: A new automatic multi-document text summarization using topic modeling. In: Fahrnberger, G., Gopinathan, S., Parida, L. (eds.) ICDCIT 2019. LNCS, vol. 11319, pp. 212–221. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05366-6_17

    Chapter  Google Scholar 

  14. Pedersen, T., Banerjee, S., Patwardhan, S.: Maximizing semantic relatedness to perform word sense disambiguation, vol. 25, p. 2005. Research report UMSI 2005/25. University of Minnesota Supercomputing Institute (2005)

    Google Scholar 

  15. Shepard, R.N.: Toward a universal law of generalization for psychological science. Science 237(4820), 1317–1323 (1987)

    Article  MathSciNet  Google Scholar 

  16. Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Article  Google Scholar 

  17. Kumar, C., Pingali, P., Varma, V.: A light-weight summarizer based on language model with relative entropy. In: Proceedings of the 2009 ACM Symposium on Applied Computing, pp. 1752–1753 (2009)

    Google Scholar 

  18. Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, vol. 8, pp. 74–81 (2004)

    Google Scholar 

  19. Wan, X.: Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1137–1145. Association for Computational Linguistics (2010)

    Google Scholar 

  20. Woodsend, K., Lapata, M.: Automatic generation of story highlights. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 565–574. Association for Computational Linguistics (2010)

    Google Scholar 

  21. Parveen, D., Ramsl, H.-M., Strube, M.: Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1949–1954 (2015)

    Google Scholar 

  22. Cheng, J., Lapata, M.: Neural summarization by extracting sentences and words. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 484–494 (2016)

    Google Scholar 

  23. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)

    Google Scholar 

  24. Yang, G., Wen, D., Chen, N.-S., Sutinen, E., et al.: A novel contextual topic model for multi-document summarization. Expert Syst. Appl. 42(3), 1340–1352 (2015)

    Article  Google Scholar 

  25. Jagarlamudi, J., Pingali, P., Varma, V.: Query independent sentence scoring approach to DUC 2006. In: Proceeding of Document Understanding Conference (DUC-2006) (2006)

    Google Scholar 

  26. Ye, S., Chua, T.-S., Kan, M.-Y., Qiu, L.: Document concept lattice for text understanding and summarization. Inf. Process. Manag. 43(6), 1643–1662 (2007)

    Article  Google Scholar 

  27. Wang, X., McCallum, A., Wei, X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 697–702. IEEE (2007)

    Google Scholar 

  28. Melli, G.: Description of SQUASH, the SFU question answering summary handler for the DUC-2006 summarization task. Safety 1, 14345754 (2006)

    Google Scholar 

  29. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: ACM SIGIR Forum, vol. 51, no. 2, pp. 268–276. ACM (2017)

    Google Scholar 

  30. Zamanian, M., Heydari, P.: Readability of texts: state of the art. Theory Pract. Lang. Stud. 2(1), 43–53 (2012)

    Article  Google Scholar 

  31. Klare, G.R.: Assessing readability. Read. Res. Quarter. 10(1), 62–102 (1975)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajendra Kumar Roul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Roul, R.K., Sahoo, J.K. (2022). A Novel Modified Harmonic Mean Combined with Cohesion Score for Multi-document Summarization. In: Bapi, R., Kulkarni, S., Mohalik, S., Peri, S. (eds) Distributed Computing and Intelligent Technology. ICDCIT 2022. Lecture Notes in Computer Science(), vol 13145. Springer, Cham. https://doi.org/10.1007/978-3-030-94876-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-94876-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-94875-7

  • Online ISBN: 978-3-030-94876-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics