Abstract
Automatic text summarization aims at producing a shorter version of a document (or a document set). Extractive summarizers compile summaries by extracting a subset of sentences from a given text, while abstractive summarizers generate new sentences. Both types of summarizers strive to preserve the meaning of the original document as much as possible. Evaluation of summarization quality is a challenging task. Due the expense of human evaluations, many researchers prefer to evaluate their systems automatically, with help of software tools. Automatic evaluations are usually performed to provide comparisons between a system-generated summary and one or more human-written summaries, according to selected measures. However, a single metric cannot reflect all quality-related aspects of a summary. For instance, evaluation of an extractive summarizer by comparing, at word level, its summaries to the abstracts written by humans is not good enough. This is so because the summaries being compared do not necessarily use the same vocabulary. Also, considering only single words does not reflect the coherency or readability of a generated summary. Multiple tools and metrics have been proposed in literature for evaluating the quality of summarizers. However, studies show that correlations between these metrics do not always hold. In this paper we present the EvAluation SYstem for Summarization (EASY), which enables the evaluation of summaries with several quality measures. The EASY system can also compare system-generated summaries to the extractive summaries produced by the OCCAMS baseline, which is considered the best possible extractive summarizer. EASY currently supports two languages–English and French–and is freely available online for the NLP community.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Abdi, A., Idris, N.: Automated summarization assessment system: quality assessment without a reference summary. In: The International Conference on Advances in Applied Science and Environmental Engineering-ASEE (2014)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cohan, A., Goharian, N.: Revisiting summarization evaluation for scientific articles. arXiv preprint arXiv:1604.00400 (2016)
Das, D., Martins, A.F.: A survey on automatic text summarization. Lit. Surv. Lang. Stat. II Course CMU 4, 192–195 (2007)
Davis, S.T., Conroy, J.M., Schlesinger, J.D.: OCCAMS-an optimal combinatorial covering algorithm for multi-document summarization. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 454–463. IEEE (2012)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Donaway, R.L., Drummey, K.W., Mather, L.A.: A comparison of rankings produced by summarization evaluation measures. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization, pp. 69–78. Association for Computational Linguistics (2000)
Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2017)
Giannakopoulos, G., Karkaletsis, V.: Autosummeng and memog in evaluating guided summaries. In: Proceedings of Text Analysis Conference (2011)
Giannakopoulos, G., et al.: Multiling 2015: multilingual summarization of single and multi-documents, on-line fora, and call-center conversations. In: Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 270–274 (2015)
Gupta, V., Lehal, G.S.: A survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2(3), 258–268 (2010)
Hovy, E., Lin, C.Y., Zhou, L., Fukumoto, J.: Automated summarization evaluation with basic elements. In: Proceedings of the Fifth Conference on Language Resources and Evaluation (LREC 2006), pp. 604–611. Citeseer (2006)
Jing, H., Barzilay, R., McKeown, K., Elhadad, M.: Summarization evaluation methods: experiments and analysis. In: AAAI Symposium on Intelligent Summarization, Palo Alto, CA, pp. 51–59 (1998)
Jing, H., McKeown, K.R.: The decomposition of human-written summary sentences. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 129–136. ACM (1999)
Jones, K.S., Galliers, J.R.: Evaluating Natural Language Processing Systems: An Analysis and Review, vol. 1083. Springer, Heidelberg (1995). https://doi.org/10.1007/BFb0027470
Karger, D.R.: A randomized fully polynomial time approximation scheme for the all-terminal network reliability problem. SIAM Rev. 43(3), 499–522 (2001)
Kasture, N., Yargal, N., Singh, N.N., Kulkarni, N., Mathur, V.: A survey on methods of abstractive text summarization. Int. J. Res. Merg. Sci. Technol 1(6), 53–57 (2014)
Khuller, S., Moss, A., Naor, J.S.: The budgeted maximum coverage problem. Inf. Process. Lett. 70(1), 39–45 (1999)
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73. ACM (1995)
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)
Lloret, E., Vodolazova, T., Moreda, P., Muñoz, R., Palomar, M.: Are better summaries also easier to understand? analyzing text complexity in automatic summarization. In: Litvak, M., Vanetik, N. (eds.) Multilingual Text Analysis: Challenges, Models, and Approaches, chap. 10. World Scientific (2019)
Mani, I.: Summarization evaluation: an overview (2001)
Mani, I., Klein, G., House, D., Hirschman, L., Firmin, T., Sundheim, B.: Summac: a text summarization evaluation. Nat. Lang. Eng. 8(1), 43–68 (2002)
McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Advances in Information Retrieval, pp. 557–564 (2007)
Merlino, A., Maybury, M.: An Empirical Study of the Optimal Presentation of Multimedia Summaries of Broadcast News. MIT Press, Cambridge (1999)
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Nanba, H., Okumura, M.: Producing more readable extracts by revising them. In: Proceedings of the 18th Conference on Computational Linguistics, vol. 2, pp. 1071–1075. Association for Computational Linguistics (2000)
Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 43–76. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_3
Nenkova, A., McKeown, K., et al.: Automatic summarization. Found. Trends® Inf. Retrieval 5(2–3), 103–233 (2011)
Nenkova, A., Passonneau, R.: Evaluating content selection in summarization: the pyramid method. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004 (2004)
Ng, J.P., Abrecht, V.: Better summarization evaluation with word embeddings for rouge. arXiv preprint arXiv:1508.06034 (2015)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Pastra, K., Saggion, H.: Colouring summaries Bleu. In: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are Evaluation Methods, Metrics and Resources Reusable?, pp. 35–42. Association for Computational Linguistics (2003)
Pittaras, N., Montanelliy, S., Giannakopoulos, G., Ferraray, A., Karkaletsis, V.: Crowdsourcing in single-document summary evaluation: the argo way. In: Litvak, M., Vanetik, N. (eds.) Multilingual Text Analysis: Challenges, Models, and Approaches, chap. 8. World Scientific (2019)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)
Sasaki, Y., et al.: The truth of the f-measure. Teach Tutor Mater 1(5), 1–5 (2007)
Steinberger, J., Ježek, K.: Text summarization and singular value decomposition. In: Yakhno, T. (ed.) ADVIS 2004. LNCS, vol. 3261, pp. 245–254. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30198-1_25
Steinberger, J., Ježek, K.: Evaluation measures for text summarization. Comput. Inform. 28(2), 251–275 (2012)
Well, A.D., Myers, J.L.: Research Design & Statistical Analysis. Psychology Press, London (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Litvak, M., Vanetik, N., Veksler, Y. (2023). EASY: Evaluation System for Summarization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_40
Download citation
DOI: https://doi.org/10.1007/978-3-031-24340-0_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24339-4
Online ISBN: 978-3-031-24340-0
eBook Packages: Computer ScienceComputer Science (R0)