Abstract
In this work, we propose an unsupervised extractive summarization framework for generating good quality summaries which are supplemented by the comments posted by the end-users. Using the evolutionary multi-objective optimization concept, different objective functions for assessing the quality of a summary, like diversity and the relevance of sentences in relation to comments, are optimized simultaneously. In the literature, named entity recognition (NER) has been shown to be useful in the summarization process. The current work is the first of its kind where we have introduced a new objective function that utilizes the concept of NER in news documents and user comments to score the news sentences. To test how well the new objective function works, different combinations of the NER-based objective function with already existing objective functions were tested on the English and French datasets using ROUGE 1, 2, and SU4 F1-scores. We have also investigated the abstractive and compressive summarization approaches for our comparative analysis. The code of the proposed work is available at the github repository https://github.com/vishalsinghroha/Unsupervised-Comment-based-Multi-document-Extractive-Summarization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alami, N., Meknassi, M., En-nahnahi, N.: Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning. Expert Syst. Appl. 123, 195–211 (2019)
Anand, D., Wagh, R.: Effective deep learning approaches for summarization of legal texts. J. King Saud Univ.-Comput. Inf. Sci. 34(5), 2141–2150 (2019)
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Bing, L., Li, P., Liao, Y., Lam, W., Guo, W., Passonneau, R.J.: Abstractive multi-document summarization via phrase selection and merging. arXiv preprint arXiv:1506.01597 (2015)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res 3, 993–1022 (2003)
Boroş, E., et al.: Alleviating digitization errors in named entity recognition for historical documents. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 431–441 (2020)
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Gao, S., Chen, X., Li, P., Ren, Z., Bing, L., Zhao, D., Yan, R.: Abstractive text summarization by incorporating reader comments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6399–6406 (2019)
Goyal, A., Gupta, V., Kumar, M.: A deep learning-based bilingual Hindi and Punjabi named entity recognition system using enhanced word embeddings. Knowl.-Based Syst. 234, 107601 (2021)
Hu, M., Sun, A., Lim, E.P.: Comments-oriented document summarization: understanding documents with readers’ feedback. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 291–298 (2008)
Jain, R., Mavi, V., Jangra, A., Saha, S.: Widar-weighted input document augmented rouge. arXiv preprint arXiv:2201.09282 (2022)
Jangra, A., Saha, S., Jatowt, A., Hasanuzzaman, M.: Multi-modal summary generation using multi-objective optimization. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1745–1748 (2020)
Lewis, et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
Li, J., Sun, A., Han, J., Li, C.: A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 34(1), 50–70 (2020)
Li, P., Bing, L., Lam, W.: Reader-aware multi-document summarization: an enhanced model and the first dataset. arXiv preprint arXiv:1708.01065 (2017)
Li, P., Wang, Z., Lam, W., Ren, Z., Bing, L.: Salience estimation via variational auto-encoders for multi-document summarization. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Mihalcea, R., Tarau, P.: Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Miller, D.: Leveraging bert for extractive text summarization on lectures. arXiv preprint arXiv:1906.04165 (2019)
Pontes, E.L., Huet, S., Torres-Moreno, J.M., Linhares, A.C.: Compressive approaches for cross-language multi-document summarization. Data Knowl. Eng. 125, 101763 (2020)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Roha, V.S., Saini, N., Saha, S., Moreno, J.G.: Unsupervised framework for comment-based multi-document extractive summarization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 574–582 (2022)
Saini, N., Saha, S., Jangra, A., Bhattacharyya, P.: Extractive single document summarization using multi-objective optimization: exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowl.-Based Syst. 164, 45–67 (2019)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Zhang, J., Zhao, Y., Saleh, M., Liu, P.: Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International Conference on Machine Learning, pp. 11328–11339. PMLR (2020)
Zhou, A., Qu, B.Y., Li, H., Zhao, S.Z., Suganthan, P.N., Zhang, Q.: Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evol. Comput. 1(1), 32–49 (2011)
Acknowledgements
Dr. Sriparna Saha gratefully acknowledges the Young Faculty Research Fellowship (YFRF) Award, supported by Visvesvaraya Ph.D. Scheme for Electronics and IT, Ministry of Electronics and Information Technology (MeitY), Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia) for carrying out this research. Dr. Naveen Saini acknowledge the postdoctoral program of the CIMI LabEx and the support received from Indian Institute of Information Technology Lucknow, India. Dr. Jose G Moreno acknowledges TERMITRAD (2020-2019-8510010) and ANR-MEERQAT (ANR-19-CE23-0028) projects.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Roha, V.S., Saini, N., Saha, S., Moreno, J.G. (2023). MOO-CMDS+NER: Named Entity Recognition-Based Extractive Comment-Oriented Multi-document Summarization. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_49
Download citation
DOI: https://doi.org/10.1007/978-3-031-28238-6_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28237-9
Online ISBN: 978-3-031-28238-6
eBook Packages: Computer ScienceComputer Science (R0)