Data-Driven Unsupervised Evaluation of Automatic Text Summarization Systems

Yagunova, Elena; Makarova, Olga; Pronoza, Ekaterina

doi:10.1007/978-3-319-27101-9_3

Elena Yagunova¹⁶,
Olga Makarova¹⁶ &
Ekaterina Pronoza¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9414))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1439 Accesses

Abstract

Automatic text summarization is a text compression problem with many applications in natural language processing. In this paper we focus the problem of the evaluation of text summarization system. We propose an unsupervised approach based on keywords: it does not require large amount of manual processing and can be implemented as a fully automatic procedure. We also conduct a series of experiments with naïve informants and professional experts. The results of the experiments with informants, experts and automatically extracted keywords confirm that keywords, as one of the types of text compression, can be successfully used for the evaluation of summaries quality. Our data is represented by (but not restricted to) different types of Russian news texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
These terms (precision, etc.), well-known in NLP community, should be interpreted in a different way here: they represent metrics by which experts estimate the quality of summaries, rather than automatically calculated quality measures. For example, in [5] experts assign each summary precision and redundancy values from the rating scale.
2.
There initially 25 articles given to each of the informants, ant they were asked to write down the words. However, after we received the first answers, we decided to change the instruction: we asked the informants to underline the words in the text. It helped us to avoid misprints and to maintain information about the positions of the words in the text. We also reduced the number of articles to 12, because the preliminary results showed that the number of errors invariably increased with the growth of the number of articles.

References

Hennig, L., De Luca, E.W., Albayrak, S.: Learning summary content units with topic modeling. In: COLING 2010: Poster Volume, pp. 391–399 (2010)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 157–165 (1958)
Google Scholar
Nenkova, A., Passonneau, R.: Evaluating content selection in summarization: the pyramid method. In: HLT-NAACL 2004: Main Proceedings, pp. 145–152 (2004)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the Third Text REtrieval Conference (TREC 1994) (1994)
Google Scholar
Solov’ev, A.N., Antonova, A.J., Pazel’skaja, A.G.: Using sentiment-analysis for text information extraction. In: Computational Linguistics and Intelligent Technology: According to the Materials of the Annual International Conference “Dialogue” vol. 11, no. 18: В 2т. Т. 1: The Main Program of the Conference, pp. 616–627. Publishing House of the Russian State Humanitarian University (2012)
Google Scholar
Yagunova, E.V., Makarova, O.E., Antonov, A.Y., Solovyov, A.N.: Various compression methods in the study of understanding the text of the news. In: Understanding in Communication. Man in the Information Space, vol. 2, pp. 414–421. Publishing House of YAGPU, Yaroslavl – Moscow (2012)
Google Scholar
Lenta.ru: Rambler Media Group. http://www.lenta.ru
TextAnalyst. http://www.analyst.ru/index.php?lang=eng&dir=content/products/&id=ta

Download references

Acknowledgements

The authors acknowledge Saint-Petersburg State University for the research grant 30.38.305.2014.

Author information

Authors and Affiliations

Saint-Petersburg State University, 7/9 Universitetskaya Nab., Saint-Petersburg, Russia
Elena Yagunova, Olga Makarova & Ekaterina Pronoza

Authors

Elena Yagunova
View author publications
You can also search for this author in PubMed Google Scholar
Olga Makarova
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterina Pronoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elena Yagunova .

Editor information

Editors and Affiliations

Unidad Profesional Interdisciplinaria, México DF, Mexico
Obdulia Pichardo Lagunas
Universidad Autónoma Metropolitana, México DF, Mexico
Oscar Herrera Alcántara
Instituto de Investigaciones Eléctricas, Cuernavaca, Morelos, Mexico
Gustavo Arroyo Figueroa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yagunova, E., Makarova, O., Pronoza, E. (2015). Data-Driven Unsupervised Evaluation of Automatic Text Summarization Systems. In: Pichardo Lagunas, O., Herrera Alcántara, O., Arroyo Figueroa, G. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2015. Lecture Notes in Computer Science(), vol 9414. Springer, Cham. https://doi.org/10.1007/978-3-319-27101-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-27101-9_3
Published: 10 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27100-2
Online ISBN: 978-3-319-27101-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics