Flexible Context Extraction for Keywords in Russian Automatic Speech Recognition Results

Khomitsevich, Olga; Boyarsky, Kirill; Kanevsky, Eugeny; Bulusheva, Anna; Mendelev, Valentin

doi:10.1007/978-3-319-52920-2_14

Olga Khomitsevich¹⁸,
Kirill Boyarsky¹⁹,
Eugeny Kanevsky²⁰,
Anna Bulusheva²¹ &
…
Valentin Mendelev¹⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 661))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

1234 Accesses

Abstract

The paper deals with extracting contexts for keywords found in text, in particular in Automatic Speech Recognition (ASR) output. We propose using a syntactic parser to find contexts by analysing the sentence structure, rather than simply using a window of several words on the left and right of the keyword, or the whole sentence. This method provides concise but meaningful contexts that are easily readable by humans and can also be used in applications such as thematic clustering. We describe the Russian SemSin system which combines a syntactic dependency parser and elements of semantic ontology. We demonstrate the use of SemSin for our task both for normal text and for recognition output, and outline the suggestions for future developments of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 436–442. ACM (2002)
Google Scholar
Mihalcea, R., Tarau, P.: A language independent algorithm for single, multiple document summarization. In: IJCNLP (2005)
Google Scholar
Boyarsky, K., Kanevsky, E.: Vega - a system for text classification and analysis. LAP Lambert Academic Publishing, Saarbrũcken (2011). in Russian
Google Scholar
Boyarsky, K., Kanevsky, E.: The semantic-and-syntactic parser SemSin. In: Dialog 2012 (2012). http://www.dialog-21.ru/digest/2012/?type=doc. in Russian
Tuzov, V.A.: Computer semantics of the Russian language. Saint-Petersburg State University Publishing House, Saint-Petersburg (2004). in Russian
Google Scholar
Covington, M.A.: A dependency parser for variable-word-order languages. Research Report (1990)
Google Scholar
Nivre, J., Boguslavsky, I.M., Iomdin, L.L.: Parsing the SynTagRus treebank of Russian. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 641–648. Association for Computational Linguistics (2008)
Google Scholar
Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS (LNAI), vol. 8773, pp. 435–442. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11581-8_54
Google Scholar
Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation, GMM-derived feature processing. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association, pp. 2997–3001 (2014)
Google Scholar
Popova, S., Krivosheeva, T., Korenevsky, M.: Automatic stop list generation for clustering recognition results of call center recordings. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS (LNAI), vol. 8773, pp. 137–144. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11581-8_17
Google Scholar

Download references

Acknowledgements

The work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.579.21.0008, ID RFMEFI57914X0008.

Author information

Authors and Affiliations

Speech Technology Center Ltd, St. Petersburg, Russia
Olga Khomitsevich & Valentin Mendelev
ITMO University, St. Petersburg, Russia
Kirill Boyarsky
St. Petersburg Institute for Economics and Mathematics, RAS, St. Petersburg, Russia
Eugeny Kanevsky
STC-Innovations Ltd, St. Petersburg, Russia
Anna Bulusheva

Authors

Olga Khomitsevich
View author publications
You can also search for this author in PubMed Google Scholar
Kirill Boyarsky
View author publications
You can also search for this author in PubMed Google Scholar
Eugeny Kanevsky
View author publications
You can also search for this author in PubMed Google Scholar
Anna Bulusheva
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Mendelev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Bulusheva .

Editor information

Editors and Affiliations

National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Krasovsky Institute of Mathematics and Mechanics, Yekaterinburg, Russia
Mikhail Yu. Khachay
Ural Federal University, Yekaterinbug, Russia
Valeri G. Labunets
Research Computing Center, Lomonosov Moscow State University, Moscow, Russia
Natalia Loukachevitch
National Research University Higher School of Economics, St. Petersburg, Russia
Sergey I. Nikolenko
Technische Universität Darmstadt, Darmstadt, Germany
Alexander Panchenko
Laboratory of Algorithms and Technologies for Networks Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Dorodnicyn Computing Centre of Russian Academy of Sciences, Moscow, Russia
Konstantin Vorontsov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khomitsevich, O., Boyarsky, K., Kanevsky, E., Bulusheva, A., Mendelev, V. (2017). Flexible Context Extraction for Keywords in Russian Automatic Speech Recognition Results. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-52920-2_14
Published: 17 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52919-6
Online ISBN: 978-3-319-52920-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics