A Novel Modified Harmonic Mean Combined with Cohesion Score for Multi-document Summarization

Roul, Rajendra Kumar; Sahoo, Jajati Keshari

doi:10.1007/978-3-030-94876-4_16

Rajendra Kumar Roul¹² &
Jajati Keshari Sahoo¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13145))

Included in the following conference series:

International Conference on Distributed Computing and Internet Technology

648 Accesses

Abstract

The abundance of textual information that is generated on a daily basis on the web, social media, and other repositories makes it critical and difficult to extract important information from a large corpus. Automatic Text Summarization (ATS) works well in this direction, which can review many documents and pull out the relevant information from them. But the computational bottlenecks associated with ATS need to be removed by finding efficient workarounds. Although existing research works have focused on this direction for further improvements, there are still many limitations and challenges which need to be addressed. The current work proposes a semantic-based word similarity combined with sentence similarity to summarize a corpus of text documents. Finally, a relative entropy-based technique using KL-divergence is proposed, which arranges the sentences in the final summary as per their importance. Experimental results on DUC datasets are promising and show the potential of the proposed technique compared to the other state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://en.wikipedia.org/wiki/Brown_Corpus.
2.
https://github.com/alvations/pywsd.
3.
decided by the experiment.
4.
http://www.nltk.org/.
5.
http://www.duc.nist.gov.

References

Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2016). https://doi.org/10.1007/s10462-016-9475-9
Article Google Scholar
Roul, R.K., Arora, K.: A nifty review to text summarization-based recommendation system for electronic products. Soft. Comput. 23(24), 13183–13204 (2019). https://doi.org/10.1007/s00500-019-03861-3
Article Google Scholar
Wang, L., Yao, J., Tao, Y., Zhong, L., Liu, W., Du, Q.: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 4453–4460 (2018)
Google Scholar
Roul, R.K., Sahoo, J.K., Goel, R.: Deep learning in the domain of multi-document text summarization. In: Shankar, B.U., Ghosh, K., Mandal, D.P., Ray, S.S., Zhang, D., Pal, S.K. (eds.) PReMI 2017. LNCS, vol. 10597, pp. 575–581. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69900-4_73
Chapter Google Scholar
Roul, R.K., Joshi, P.M., Sahoo, J.K.: Abstractive text summarization using enhanced attention model. In: Tiwary, U.S., Chaudhury, S. (eds.) IHCI 2019. LNCS, vol. 11886, pp. 63–76. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44689-5_6
Chapter Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Elbarougy, R., Behery, G., Khatib, A.E.: Graph-based extractive Arabic text summarization using multiple morphological analyzers. J. Inf. Sci. Eng. 36(2), 347–367 (2020)
Google Scholar
Wang, D., Zhu, S., Li, T., Chi, Y., Gong, Y.: Integrating document clustering and multidocument summarization. ACM Trans. Knowl. Discov. Data (TKDD) 5(3), 1–26 (2011)
Article Google Scholar
Abdi, A., Shamsuddin, S.M., Hasan, S., Piran, J.: Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment. Expert Syst. Appl. 109, 66–85 (2018)
Article Google Scholar
Suanmali, L., Salim, N., Binwahlan, M.S.: Feature-based sentence extraction using fuzzy inference rules. In: 2009 International Conference on Signal Processing Systems, pp. 511–515. IEEE (2009)
Google Scholar
Roul, R.K.: Topic modeling combined with classification technique for extractive multi-document text summarization. Soft. Comput. 25(2), 1113–1127 (2020). https://doi.org/10.1007/s00500-020-05207-w
Article Google Scholar
Roul, R.K., Mehrotra, S., Pungaliya, Y., Sahoo, J.K.: A new automatic multi-document text summarization using topic modeling. In: Fahrnberger, G., Gopinathan, S., Parida, L. (eds.) ICDCIT 2019. LNCS, vol. 11319, pp. 212–221. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05366-6_17
Chapter Google Scholar
Pedersen, T., Banerjee, S., Patwardhan, S.: Maximizing semantic relatedness to perform word sense disambiguation, vol. 25, p. 2005. Research report UMSI 2005/25. University of Minnesota Supercomputing Institute (2005)
Google Scholar
Shepard, R.N.: Toward a universal law of generalization for psychological science. Science 237(4820), 1317–1323 (1987)
Article MathSciNet Google Scholar
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Article Google Scholar
Kumar, C., Pingali, P., Varma, V.: A light-weight summarizer based on language model with relative entropy. In: Proceedings of the 2009 ACM Symposium on Applied Computing, pp. 1752–1753 (2009)
Google Scholar
Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, vol. 8, pp. 74–81 (2004)
Google Scholar
Wan, X.: Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1137–1145. Association for Computational Linguistics (2010)
Google Scholar
Woodsend, K., Lapata, M.: Automatic generation of story highlights. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 565–574. Association for Computational Linguistics (2010)
Google Scholar
Parveen, D., Ramsl, H.-M., Strube, M.: Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1949–1954 (2015)
Google Scholar
Cheng, J., Lapata, M.: Neural summarization by extracting sentences and words. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 484–494 (2016)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Yang, G., Wen, D., Chen, N.-S., Sutinen, E., et al.: A novel contextual topic model for multi-document summarization. Expert Syst. Appl. 42(3), 1340–1352 (2015)
Article Google Scholar
Jagarlamudi, J., Pingali, P., Varma, V.: Query independent sentence scoring approach to DUC 2006. In: Proceeding of Document Understanding Conference (DUC-2006) (2006)
Google Scholar
Ye, S., Chua, T.-S., Kan, M.-Y., Qiu, L.: Document concept lattice for text understanding and summarization. Inf. Process. Manag. 43(6), 1643–1662 (2007)
Article Google Scholar
Wang, X., McCallum, A., Wei, X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 697–702. IEEE (2007)
Google Scholar
Melli, G.: Description of SQUASH, the SFU question answering summary handler for the DUC-2006 summarization task. Safety 1, 14345754 (2006)
Google Scholar
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: ACM SIGIR Forum, vol. 51, no. 2, pp. 268–276. ACM (2017)
Google Scholar
Zamanian, M., Heydari, P.: Readability of texts: state of the art. Theory Pract. Lang. Stud. 2(1), 43–53 (2012)
Article Google Scholar
Klare, G.R.: Assessing readability. Read. Res. Quarter. 10(1), 62–102 (1975)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Thapar Institute of Engineering and Technology, Patiala, Punjab, India
Rajendra Kumar Roul
BITS-Pilani, K.K. Birla Goa Campus, Goa, India
Jajati Keshari Sahoo

Authors

Rajendra Kumar Roul
View author publications
You can also search for this author in PubMed Google Scholar
Jajati Keshari Sahoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajendra Kumar Roul .

Editor information

Editors and Affiliations

International Institute of Information Technology, Hyderabad, India
Raju Bapi
Michigan State University, East Lansing, MI, USA
Sandeep Kulkarni
Ericsson India Global Services Private Ltd., Bangalore, India
Swarup Mohalik
Indian Institute of Technology Hyderabad, Kandi, Telangana, India
Sathya Peri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roul, R.K., Sahoo, J.K. (2022). A Novel Modified Harmonic Mean Combined with Cohesion Score for Multi-document Summarization. In: Bapi, R., Kulkarni, S., Mohalik, S., Peri, S. (eds) Distributed Computing and Intelligent Technology. ICDCIT 2022. Lecture Notes in Computer Science(), vol 13145. Springer, Cham. https://doi.org/10.1007/978-3-030-94876-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-94876-4_16
Published: 17 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-94875-7
Online ISBN: 978-3-030-94876-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics