Generating Sport Summaries: A Case Study for Russian

Malykh, Valentin; Porplenko, Denis; Tutubalina, Elena

doi:10.1007/978-3-030-72610-2_11

Generating Sport Summaries: A Case Study for Russian

Valentin Malykh²³,
Denis Porplenko²⁴ &
Elena Tutubalina^23,25

Conference paper
First Online: 09 April 2021

798 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12602))

Abstract

We present a novel dataset of sports broadcasts with 8,781 games. The dataset contains 700 thousand comments and 93 thousand related news documents in Russian. We run an extensive series of experiments of modern extractive and abstractive approaches. The results demonstrate that BERT-based models show modest performance, reaching up to 0.26 ROUGE-1F-measure. In addition, human evaluation shows that neural approaches could generate feasible although inaccurate news basing on broadcast text.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The owner of the dataset approved its publication, so it will be released shortly after the paper is published.
2.
https://github.com/google-research/bert/blob/master/multilingual.md.
3.
https://nlpub.mipt.ru/Russian_Distributional_Thesaurus.
4.
http://bit.ly/diploma_pagerank.
5.
https://radimrehurek.com/gensim/index.html.
6.
https://github.com/DenisOgr/lexrank/pull/1/files.
7.
https://pypi.org/project/lexrank/.

References

Bouayad-Agha, N., Casamayor, G., Mille, S., Wanner, L.: Perspective-oriented generation of football match summaries: old tasks, new challenges. ACM Trans. Speech Lang. Process. 9(2), 3:1–3:31 (2012). https://doi.org/10.1145/2287710.2287711
Bouayad-Agha, N., Casamayor, G., Wanner, L.: Content selection from an ontology-based knowledge base for the generation of football summaries. In: Proceedings of the 13th European Workshop on Natural Language Generation, pp. 72–81. Association for Computational Linguistics, Nancy, France, September 2011. https://www.aclweb.org/anthology/W11-2810
Celikyilmaz, A., Bosselut, A., He, X., Choi, Y.: Deep communicating agents for abstractive summarization (2018)
Google Scholar
Gavrilov, D., Kalaidin, P., Malykh, V.: Self-attentive model for headline generation. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11438, pp. 87–93. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15719-7_11
Chapter Google Scholar
Graefe, A.: Graduate school of Journalism. Tow Center for Digital Journalism, C.U.G.S., GitBook: Guide to Automated Journalism (2016). https://books.google.com.ua/books?id=0iPbjwEACAAJ
Graham, Y.: Re-evaluating automatic summarization with BLEU and 192 shades of ROUGE. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 128–137. Association for Computational Linguistics, Lisbon, Portugal, September 2015. https://doi.org/10.18653/v1/D15-1013, https://www.aclweb.org/anthology/D15-1013
Gusev, I.: Importance of copying mechanism for news headline generation (2019)
Google Scholar
Hermann, K.M., et al.: Teaching machines to read and comprehend. CoRR abs/1506.03340 (2015). http://arxiv.org/abs/1506.03340
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 1693–1701. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend.pdf
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.M.: OpenNMT: open-source toolkit for neural machine translation. CoRR abs/1701.02810 (2017). http://arxiv.org/abs/1701.02810
Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilingual transformers for Russian language (2019)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. Association for Computational Linguistics, Barcelona, Spain, July 2004. https://www.aclweb.org/anthology/W04-1013
Liu, Y., Lapata, M.: Text summarization with pretrained encoders (2019)
Google Scholar
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. ETMTNLP 2002, vol. 1, p. 63–70. Association for Computational Linguistics, USA (2002). https://doi.org/10.3115/1118108.1118117
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Google Scholar
Nallapati, R., Zhou, B., dos santos, C.N., Gulcehre, C., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond (2016)
Google Scholar
Narayan, S., Cohen, S.B., Lapata, M.: Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1797–1807. Association for Computational Linguistics, Brussels, Belgium, October–November 2018. https://doi.org/10.18653/v1/D18-1206, https://www.aclweb.org/anthology/D18-1206
Panchenko, A., Ustalov, D., Arefyev, N., Paperno, D., Konstantinova, N., Loukachevitch, N., Biemann, C.: Human and machine judgements for Russian semantic relatedness. In: Analysis of Images, Social Networks and Texts: 5th International Conference, AIST 2016, Yekaterinburg, Russia, 7–9 April 2016, Revised Selected Papers, pp. 221–235. Springer International Publishing, Yekaterinburg, Russia (2017). https://doi.org/10.1007/978-3-319-52920-2_21
Over, P.: An introduction to DUC-2001: intrinsic evaluation of generic news text summarization system (2001)
Google Scholar
Paulus, R., Xiong, C., Socher, R.: A deep reinforced model for abstractive summarization (2017)
Google Scholar
Sandhaus, E.: The New York times annotated corpus LDC2008t19 (2008)
Google Scholar
Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine, pp. 273–280 (2003)
Google Scholar
Shavrina T., Shapovalova, O.: To the methodology of corpus construction for machine learning: « taiga» syntax tree corpus and parser. In: Proceedings of CORPORA2017, International Conference, Saint-Petersbourg (2017)
Google Scholar
Sokolov, A.: Phrase-based attentional transformer for headline generation. In: Computational Linguistics and Intellectual Technologies (2019)
Google Scholar
Stepanov, M.: News headline generation using stems, lemmas and grammemes. In: Computational Linguistics and Intellectual Technologies (2019)
Google Scholar
Tan, J., Wan, X., Xiao, J.: From neural sentence summarization to headline generation: a coarse-to-fine approach. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 4109–4115. IJCAI 2017. AAAI Press (2017). http://dl.acm.org/citation.cfm?id=3171837.3171860
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification (2015)
Google Scholar

Download references

Acknowledgements

The work of the first author was funded by RFBR, project number 19-37-60027. The final work on the manuscript carried out by Elena Tutubalina was funded by the framework of the HSE University Basic Research Program and Russian Academic Excellence Project “5–100”.

Author information

Authors and Affiliations

Kazan Federal University, Kazan, Russian Federation
Valentin Malykh & Elena Tutubalina
Ukrainian Catholic University, Lviv, Ukraine
Denis Porplenko
National Research University Higher School of Economics, Moscow, Russian Federation
Elena Tutubalina

Authors

Valentin Malykh
View author publications
You can also search for this author in PubMed Google Scholar
Denis Porplenko
View author publications
You can also search for this author in PubMed Google Scholar
Elena Tutubalina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valentin Malykh .

Editor information

Editors and Affiliations

RWTH Aachen University, Aachen, Germany
Wil M. P. van der Aalst
University of Ljubljana, Ljubljana, Slovenia
Vladimir Batagelj
National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Krasovskii Institute of Mathematics and Mechanics, Yekaterinburg, Russia
Michael Khachay
National Research University Higher School of Economics, St. Petersburg, Russia
Olessia Koltsova
University of Oslo, Oslo, Norway
Andrey Kutuzov
National Research University Higher School of Economics, Moscow, Russia
Sergei O. Kuznetsov
National Research University Higher School of Economics, Moscow, Russia
Irina A. Lomazova
Moscow State University, Moscow, Russia
Natalia Loukachevitch
LORIA, Vandœuvre lès Nancy, France
Amedeo Napoli
Skolkovo Institute of Science and Technology, Moscow, Russia
Alexander Panchenko
University of Florida, Gainesville, FL, USA
Panos M. Pardalos
Università Ca' Foscari Venezia, Venice, Italy
Marcello Pelillo
National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Kazan Federal University, Kazan, Russia
Elena Tutubalina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malykh, V., Porplenko, D., Tutubalina, E. (2021). Generating Sport Summaries: A Case Study for Russian. In: van der Aalst, W.M.P., et al. Analysis of Images, Social Networks and Texts. AIST 2020. Lecture Notes in Computer Science(), vol 12602. Springer, Cham. https://doi.org/10.1007/978-3-030-72610-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-72610-2_11
Published: 09 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72609-6
Online ISBN: 978-3-030-72610-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics