Skip to main content

Flexible Context Extraction for Keywords in Russian Automatic Speech Recognition Results

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2016)

Abstract

The paper deals with extracting contexts for keywords found in text, in particular in Automatic Speech Recognition (ASR) output. We propose using a syntactic parser to find contexts by analysing the sentence structure, rather than simply using a window of several words on the left and right of the keyword, or the whole sentence. This method provides concise but meaningful contexts that are easily readable by humans and can also be used in applications such as thematic clustering. We describe the Russian SemSin system which combines a syntactic dependency parser and elements of semantic ontology. We demonstrate the use of SemSin for our task both for normal text and for recognition output, and outline the suggestions for future developments of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 436–442. ACM (2002)

    Google Scholar 

  2. Mihalcea, R., Tarau, P.: A language independent algorithm for single, multiple document summarization. In: IJCNLP (2005)

    Google Scholar 

  3. Boyarsky, K., Kanevsky, E.: Vega - a system for text classification and analysis. LAP Lambert Academic Publishing, Saarbrũcken (2011). in Russian

    Google Scholar 

  4. Boyarsky, K., Kanevsky, E.: The semantic-and-syntactic parser SemSin. In: Dialog 2012 (2012). http://www.dialog-21.ru/digest/2012/?type=doc. in Russian

  5. Tuzov, V.A.: Computer semantics of the Russian language. Saint-Petersburg State University Publishing House, Saint-Petersburg (2004). in Russian

    Google Scholar 

  6. Covington, M.A.: A dependency parser for variable-word-order languages. Research Report (1990)

    Google Scholar 

  7. Nivre, J., Boguslavsky, I.M., Iomdin, L.L.: Parsing the SynTagRus treebank of Russian. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 641–648. Association for Computational Linguistics (2008)

    Google Scholar 

  8. Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS (LNAI), vol. 8773, pp. 435–442. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11581-8_54

    Google Scholar 

  9. Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation, GMM-derived feature processing. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association, pp. 2997–3001 (2014)

    Google Scholar 

  10. Popova, S., Krivosheeva, T., Korenevsky, M.: Automatic stop list generation for clustering recognition results of call center recordings. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS (LNAI), vol. 8773, pp. 137–144. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11581-8_17

    Google Scholar 

Download references

Acknowledgements

The work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.579.21.0008, ID RFMEFI57914X0008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Bulusheva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Khomitsevich, O., Boyarsky, K., Kanevsky, E., Bulusheva, A., Mendelev, V. (2017). Flexible Context Extraction for Keywords in Russian Automatic Speech Recognition Results. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52920-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52919-6

  • Online ISBN: 978-3-319-52920-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics