Skip to main content

C-DESERT Score for Arabic Text Summary Evaluation

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13501))

Included in the following conference series:

Abstract

Text summary evaluation represents an important step after building any summarization system. Despite the important number of metrics that have been developed, there are a few metrics that evaluate Arabic text summary. In this paper, we present a new automatic metric for Arabic text summary evaluation. This metric combines ROUGE scores with documents embedding-based scores to build a regression model that predicts the manual score of an Arabic summary. First, we have constructed document embedding models with different vector sizes then, we have used these models to present each candidate and model summary as a document embedding vector. After that, a similarity score between the two document embedding vectors was calculated. Finally, we have combined several similarity scores based on the document embedding representation and the ROUGE scores to predict a manual score. Furthermore, in the combination phase, we have tried multiple regression models to obtain the most optimal predictive model. The obtained result showed that the proposed method outperforms all baseline metrics in the task of text summary evaluation and the task of the summarization system evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The most closely related vectors will obtain the highest cosine similarity.

  2. 2.

    The most closely related vectors will obtain the lowest Euclidean distance.

References

  • Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of ACL-04 Workshop: Text Summarization Branches Out, pp. 74–81 (2004)

    Google Scholar 

  • Giannakopoulos, G., Vangelis, K., George, V., Panagiotis, S.: Summarization system evaluation revisited: N-gram graphs. TSLP 5(3), 1–39 (2008)

    Google Scholar 

  • Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of ICML - Volume 32, pp. II–1188–II–1196 (2014)

    Google Scholar 

  • Mrabet, Y., Demner-Fushman, D.: HOLMS: Alternative Summary Evaluation with Large Language Models. In: Proceedings of the International Conference on Computational Linguistics, pp. 5679–5688 (2020)

    Google Scholar 

  • Cer, D., et al.: Universal sentence encoder. In: arXiv preprint arXiv:1803.11175 (2018)

  • Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  • Nenkova, A., Passonneau, R.: Evaluating content selection in summarization: The pyramid method. En Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pp. 145–152 (2004)

    Google Scholar 

  • He, T., et al.: ROUGE-C: A Fully Automated Evaluation Method for Multi-document Summarization. In: IEEE International Conference on Granular Computing, pp. 269–274 (2008)

    Google Scholar 

  • Zhou, L., Lin, C.-Y., Munteanu, D.S., Hovy, E.: ParaEval: Using Paraphrases to Evaluate Summaries Automatically. In: Proceedings of NAACL, pp. 447–454 (2006)

    Google Scholar 

  • Giannakopoulos, G., Karkaletsis, V.: Autosummeng and memog in evaluating guided summaries. In: TAC conference. NIST (2011)

    Google Scholar 

  • Stanojević, M., Sima’an, K.: BEER: BEtter Evaluation as Ranking. In: Proceedings of the Workshop on Statistical Machine Translation, pp. 414–419 (2014)

    Google Scholar 

  • Conroy, J.M., Dang, H.T.: Mind the Gap: Dangers of Divorcing Evaluations of Summary Content from Linguistic Quality. In: Proceedings of the International Conference on Computational Linguistics, pp. 145–152 (2008)

    Google Scholar 

  • Conroy, J.M., Schlesinger, J.D., OLeary, D.P.: Nouveau-ROUGE: a novelty metric for update summarization. Computational Linguistics 37(1), 1–8 (2011)

    Google Scholar 

  • Pitler, E., Louis, A., Nenkova, A.: Automatic evaluation of linguistic quality in multi-document summarization. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 544–554 (2010)

    Google Scholar 

  • Ellouze, S., Jaoua, M., Belguith, L.H.: Automatic evaluation of a summary’s linguistic quality. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 392–400. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41754-7_39

    Chapter  Google Scholar 

  • Ellouze, S., Jaoua, M., Hadrich Belguith, L.: Merging multiple features to evaluate the content of text summary. In Procesamiento del Lenguaje Natural 58, 69–76 (2017a)

    Google Scholar 

  • Ng, J.-P., Abrecht, V.: Better Summarization Evaluation with Word Embeddings for ROUGE. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1925–1930 (2015)

    Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop Poster, (2013)

    Google Scholar 

  • Sun, S., Nenkova, A.: The feasibility of embedding based automatic evaluation for single document summarization. In: Proceedings of Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1216–1221 (2019)

    Google Scholar 

  • Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, pp. 2227– 2237 (2018)

    Google Scholar 

  • Zhang, T., Varsha, K., Felix, W., Kilian, Q.W., Yoav A.: BERT score: evaluating text generation with BERT. In: Proceedings of the International Conference on Learning Representations (2020)

    Google Scholar 

  • Devlin, J., Ming-Wei, C., Kenton, L., Kristina, T.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2018)

    Google Scholar 

  • Xenouleas, S., Malakasiotis, P., Apidianaki, M., Androutsopoulos,I. : Sum-QE: a bert-based summary quality estimation model. In: EMNLP-IJCNLP, pp. 6004–6010 (2019)

    Google Scholar 

  • Dang, H.T.: Overview of DUC 2006. In: Document Understanding Conference (DUC) (2006)

    Google Scholar 

  • Elghannam, F., El-Shishtawy, T.: Keyphrase based evaluation of automatic text summarization. Int. J. Computer Appl. 117(7), 5–8 (2015)

    Google Scholar 

  • Ellouze. S., Jaoua, M., Hadrich Belguith, L.: Arabic text summary evaluation method. In: Proceedings of the International Business Information Management Association Conference - Education Excellence and Innovation Management through Vision 2020: From Regional Development Sustainability to Global Economic Growth, pp. 3532–3541 (2017b)

    Google Scholar 

  • Abbas, M., Smaïli, K., Berkani, D.: Evaluation of topic identification methods on arabic corpora. J. Digital Inf. Manage. 9(5), 185–192 (2011)

    Google Scholar 

  • Abbas, M., Smaili, K.: Comparison of topic identification methods for Arabic language. In: Recent Advances in Natural Language Processing, pp. 14–17 (2005)

    Google Scholar 

  • Darwish, K., Mubarak, H.: Farasa: a new fast and accurate Arabic word segmenter. In: Proceedings of LREC, pp. 1070–1074 (2016)

    Google Scholar 

  • Guyon, I., Weston, J., Barnhill, S., et al.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  • Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., Varma, V.: Tac 2011 multiling pilot overview. In: Proceedings of the Fourth Text Analysis Conference (2011)

    Google Scholar 

  • Giannakopoulos, G.: Multi-document multi-lingual summarization and evaluation tracks in ACL’acl 2013 multiling workshop. In: Proceedings of the MultiLing 2013 Workshop on, pp. 20–28 (2013)

    Google Scholar 

  • Giannakopoulos, G., Karkaletsis, V.: Summary evaluation: Together we stand NPowER-ed. In: Proceedings of international conference on Computational Linguistics and Intelligent Text Processing - 2, pp. 436–450 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samira Ellouze .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ellouze, S., Jaoua, M., Atoui, A. (2022). C-DESERT Score for Arabic Text Summary Evaluation. In: Nguyen, N.T., Manolopoulos, Y., Chbeir, R., Kozierkiewicz, A., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2022. Lecture Notes in Computer Science(), vol 13501. Springer, Cham. https://doi.org/10.1007/978-3-031-16014-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16014-1_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16013-4

  • Online ISBN: 978-3-031-16014-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics