MOO-CMDS+NER: Named Entity Recognition-Based Extractive Comment-Oriented Multi-document Summarization

Roha, Vishal Singh; Saini, Naveen; Saha, Sriparna; Moreno, Jose G.

doi:10.1007/978-3-031-28238-6_49

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13981))

Included in the following conference series:

European Conference on Information Retrieval

1500 Accesses

Abstract

In this work, we propose an unsupervised extractive summarization framework for generating good quality summaries which are supplemented by the comments posted by the end-users. Using the evolutionary multi-objective optimization concept, different objective functions for assessing the quality of a summary, like diversity and the relevance of sentences in relation to comments, are optimized simultaneously. In the literature, named entity recognition (NER) has been shown to be useful in the summarization process. The current work is the first of its kind where we have introduced a new objective function that utilizes the concept of NER in news documents and user comments to score the news sentences. To test how well the new objective function works, different combinations of the NER-based objective function with already existing objective functions were tested on the English and French datasets using ROUGE 1, 2, and SU4 F1-scores. We have also investigated the abstractive and compressive summarization approaches for our comparative analysis. The code of the proposed work is available at the github repository https://github.com/vishalsinghroha/Unsupervised-Comment-based-Multi-document-Extractive-Summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/vishalsinghroha/FrenchDatasetwithcomments.

References

Alami, N., Meknassi, M., En-nahnahi, N.: Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning. Expert Syst. Appl. 123, 195–211 (2019)
Article Google Scholar
Anand, D., Wagh, R.: Effective deep learning approaches for summarization of legal texts. J. King Saud Univ.-Comput. Inf. Sci. 34(5), 2141–2150 (2019)
Google Scholar
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Bing, L., Li, P., Liao, Y., Lam, W., Guo, W., Passonneau, R.J.: Abstractive multi-document summarization via phrase selection and merging. arXiv preprint arXiv:1506.01597 (2015)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res 3, 993–1022 (2003)
MATH Google Scholar
Boroş, E., et al.: Alleviating digitization errors in named entity recognition for historical documents. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 431–441 (2020)
Google Scholar
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Article Google Scholar
Gao, S., Chen, X., Li, P., Ren, Z., Bing, L., Zhao, D., Yan, R.: Abstractive text summarization by incorporating reader comments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6399–6406 (2019)
Google Scholar
Goyal, A., Gupta, V., Kumar, M.: A deep learning-based bilingual Hindi and Punjabi named entity recognition system using enhanced word embeddings. Knowl.-Based Syst. 234, 107601 (2021)
Article Google Scholar
Hu, M., Sun, A., Lim, E.P.: Comments-oriented document summarization: understanding documents with readers’ feedback. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 291–298 (2008)
Google Scholar
Jain, R., Mavi, V., Jangra, A., Saha, S.: Widar-weighted input document augmented rouge. arXiv preprint arXiv:2201.09282 (2022)
Jangra, A., Saha, S., Jatowt, A., Hasanuzzaman, M.: Multi-modal summary generation using multi-objective optimization. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1745–1748 (2020)
Google Scholar
Lewis, et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
Li, J., Sun, A., Han, J., Li, C.: A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 34(1), 50–70 (2020)
Article Google Scholar
Li, P., Bing, L., Lam, W.: Reader-aware multi-document summarization: an enhanced model and the first dataset. arXiv preprint arXiv:1708.01065 (2017)
Li, P., Wang, Z., Lam, W., Ren, Z., Bing, L.: Salience estimation via variational auto-encoders for multi-document summarization. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Mihalcea, R., Tarau, P.: Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Miller, D.: Leveraging bert for extractive text summarization on lectures. arXiv preprint arXiv:1906.04165 (2019)
Pontes, E.L., Huet, S., Torres-Moreno, J.M., Linhares, A.C.: Compressive approaches for cross-language multi-document summarization. Data Knowl. Eng. 125, 101763 (2020)
Article Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Roha, V.S., Saini, N., Saha, S., Moreno, J.G.: Unsupervised framework for comment-based multi-document extractive summarization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 574–582 (2022)
Google Scholar
Saini, N., Saha, S., Jangra, A., Bhattacharyya, P.: Extractive single document summarization using multi-objective optimization: exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowl.-Based Syst. 164, 45–67 (2019)
Article Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Zhang, J., Zhao, Y., Saleh, M., Liu, P.: Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International Conference on Machine Learning, pp. 11328–11339. PMLR (2020)
Google Scholar
Zhou, A., Qu, B.Y., Li, H., Zhao, S.Z., Suganthan, P.N., Zhang, Q.: Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evol. Comput. 1(1), 32–49 (2011)
Article Google Scholar

Download references

Acknowledgements

Dr. Sriparna Saha gratefully acknowledges the Young Faculty Research Fellowship (YFRF) Award, supported by Visvesvaraya Ph.D. Scheme for Electronics and IT, Ministry of Electronics and Information Technology (MeitY), Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia) for carrying out this research. Dr. Naveen Saini acknowledge the postdoctoral program of the CIMI LabEx and the support received from Indian Institute of Information Technology Lucknow, India. Dr. Jose G Moreno acknowledges TERMITRAD (2020-2019-8510010) and ANR-MEERQAT (ANR-19-CE23-0028) projects.

Author information

Authors and Affiliations

Indian Institute of Technology Patna, Bihar, India
Vishal Singh Roha & Sriparna Saha
Indian Institute of Information Technology, Lucknow, Uttar Pradesh, India
Naveen Saini
University of La Rochelle, L3i, 17000, La Rochelle, France
Jose G. Moreno
University of Toulouse, IRIT, Toulouse, France
Jose G. Moreno

Authors

Vishal Singh Roha
View author publications
You can also search for this author in PubMed Google Scholar
Naveen Saini
View author publications
You can also search for this author in PubMed Google Scholar
Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar
Jose G. Moreno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vishal Singh Roha .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Université Grenoble-Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
University of Tsukuba, Ibaraki, Japan
Hideo Joho
Dublin City University, Dublin, Ireland
Brian Davis
Dublin City University, Dublin, Ireland
Cathal Gurrin
Universität Regensburg, Regensburg, Germany
Udo Kruschwitz
Dublin City University, Dublin, Ireland
Annalina Caputo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roha, V.S., Saini, N., Saha, S., Moreno, J.G. (2023). MOO-CMDS+NER: Named Entity Recognition-Based Extractive Comment-Oriented Multi-document Summarization. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_49

Download citation

DOI: https://doi.org/10.1007/978-3-031-28238-6_49
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28237-9
Online ISBN: 978-3-031-28238-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics