Abstract
Text summary evaluation represents an important step after building any summarization system. Despite the important number of metrics that have been developed, there are a few metrics that evaluate Arabic text summary. In this paper, we present a new automatic metric for Arabic text summary evaluation. This metric combines ROUGE scores with documents embedding-based scores to build a regression model that predicts the manual score of an Arabic summary. First, we have constructed document embedding models with different vector sizes then, we have used these models to present each candidate and model summary as a document embedding vector. After that, a similarity score between the two document embedding vectors was calculated. Finally, we have combined several similarity scores based on the document embedding representation and the ROUGE scores to predict a manual score. Furthermore, in the combination phase, we have tried multiple regression models to obtain the most optimal predictive model. The obtained result showed that the proposed method outperforms all baseline metrics in the task of text summary evaluation and the task of the summarization system evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The most closely related vectors will obtain the highest cosine similarity.
- 2.
The most closely related vectors will obtain the lowest Euclidean distance.
References
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of ACL-04 Workshop: Text Summarization Branches Out, pp. 74–81 (2004)
Giannakopoulos, G., Vangelis, K., George, V., Panagiotis, S.: Summarization system evaluation revisited: N-gram graphs. TSLP 5(3), 1–39 (2008)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of ICML - Volume 32, pp. II–1188–II–1196 (2014)
Mrabet, Y., Demner-Fushman, D.: HOLMS: Alternative Summary Evaluation with Large Language Models. In: Proceedings of the International Conference on Computational Linguistics, pp. 5679–5688 (2020)
Cer, D., et al.: Universal sentence encoder. In: arXiv preprint arXiv:1803.11175 (2018)
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)
Nenkova, A., Passonneau, R.: Evaluating content selection in summarization: The pyramid method. En Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pp. 145–152 (2004)
He, T., et al.: ROUGE-C: A Fully Automated Evaluation Method for Multi-document Summarization. In: IEEE International Conference on Granular Computing, pp. 269–274 (2008)
Zhou, L., Lin, C.-Y., Munteanu, D.S., Hovy, E.: ParaEval: Using Paraphrases to Evaluate Summaries Automatically. In: Proceedings of NAACL, pp. 447–454 (2006)
Giannakopoulos, G., Karkaletsis, V.: Autosummeng and memog in evaluating guided summaries. In: TAC conference. NIST (2011)
Stanojević, M., Sima’an, K.: BEER: BEtter Evaluation as Ranking. In: Proceedings of the Workshop on Statistical Machine Translation, pp. 414–419 (2014)
Conroy, J.M., Dang, H.T.: Mind the Gap: Dangers of Divorcing Evaluations of Summary Content from Linguistic Quality. In: Proceedings of the International Conference on Computational Linguistics, pp. 145–152 (2008)
Conroy, J.M., Schlesinger, J.D., OLeary, D.P.: Nouveau-ROUGE: a novelty metric for update summarization. Computational Linguistics 37(1), 1–8 (2011)
Pitler, E., Louis, A., Nenkova, A.: Automatic evaluation of linguistic quality in multi-document summarization. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 544–554 (2010)
Ellouze, S., Jaoua, M., Belguith, L.H.: Automatic evaluation of a summary’s linguistic quality. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 392–400. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41754-7_39
Ellouze, S., Jaoua, M., Hadrich Belguith, L.: Merging multiple features to evaluate the content of text summary. In Procesamiento del Lenguaje Natural 58, 69–76 (2017a)
Ng, J.-P., Abrecht, V.: Better Summarization Evaluation with Word Embeddings for ROUGE. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1925–1930 (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop Poster, (2013)
Sun, S., Nenkova, A.: The feasibility of embedding based automatic evaluation for single document summarization. In: Proceedings of Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1216–1221 (2019)
Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, pp. 2227– 2237 (2018)
Zhang, T., Varsha, K., Felix, W., Kilian, Q.W., Yoav A.: BERT score: evaluating text generation with BERT. In: Proceedings of the International Conference on Learning Representations (2020)
Devlin, J., Ming-Wei, C., Kenton, L., Kristina, T.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2018)
Xenouleas, S., Malakasiotis, P., Apidianaki, M., Androutsopoulos,I. : Sum-QE: a bert-based summary quality estimation model. In: EMNLP-IJCNLP, pp. 6004–6010 (2019)
Dang, H.T.: Overview of DUC 2006. In: Document Understanding Conference (DUC) (2006)
Elghannam, F., El-Shishtawy, T.: Keyphrase based evaluation of automatic text summarization. Int. J. Computer Appl. 117(7), 5–8 (2015)
Ellouze. S., Jaoua, M., Hadrich Belguith, L.: Arabic text summary evaluation method. In: Proceedings of the International Business Information Management Association Conference - Education Excellence and Innovation Management through Vision 2020: From Regional Development Sustainability to Global Economic Growth, pp. 3532–3541 (2017b)
Abbas, M., Smaïli, K., Berkani, D.: Evaluation of topic identification methods on arabic corpora. J. Digital Inf. Manage. 9(5), 185–192 (2011)
Abbas, M., Smaili, K.: Comparison of topic identification methods for Arabic language. In: Recent Advances in Natural Language Processing, pp. 14–17 (2005)
Darwish, K., Mubarak, H.: Farasa: a new fast and accurate Arabic word segmenter. In: Proceedings of LREC, pp. 1070–1074 (2016)
Guyon, I., Weston, J., Barnhill, S., et al.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., Varma, V.: Tac 2011 multiling pilot overview. In: Proceedings of the Fourth Text Analysis Conference (2011)
Giannakopoulos, G.: Multi-document multi-lingual summarization and evaluation tracks in ACL’acl 2013 multiling workshop. In: Proceedings of the MultiLing 2013 Workshop on, pp. 20–28 (2013)
Giannakopoulos, G., Karkaletsis, V.: Summary evaluation: Together we stand NPowER-ed. In: Proceedings of international conference on Computational Linguistics and Intelligent Text Processing - 2, pp. 436–450 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ellouze, S., Jaoua, M., Atoui, A. (2022). C-DESERT Score for Arabic Text Summary Evaluation. In: Nguyen, N.T., Manolopoulos, Y., Chbeir, R., Kozierkiewicz, A., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2022. Lecture Notes in Computer Science(), vol 13501. Springer, Cham. https://doi.org/10.1007/978-3-031-16014-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-16014-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16013-4
Online ISBN: 978-3-031-16014-1
eBook Packages: Computer ScienceComputer Science (R0)