Skip to main content

Using Syllables as Acoustic Units for Spontaneous Speech Recognition

  • Conference paper
Text, Speech and Dialogue (TSD 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6231))

Included in the following conference series:

  • 1431 Accesses

Abstract

In this work, we deal with advanced context-dependent automatic speech recognition (ASR) of Czech spontaneous talk using hidden Markov models (HMM). Context-dependent units (e.g. triphones, diphones) in ASR systems provide significant improvement against simple non-context-dependent units. However, for spontaneous speech recognition we had to overcome some very challenging tasks. For one, the number of syllables compared to the size of spontaneous speech corpus makes the usage of context-dependent units very difficult. The main part of this article shows problems and procedures to effectively build and use a syllable-based ASR with the LASER (ASR system developed at Department of Computer Science and Engineering, Faculty of Applied Sciences). The procedures are usable with virtual any modern ASR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lánský, J., Žemlička, M.: Text Compression: Syllables. In: Proceedings of the Dateso 2005 Annual International Workshop on Databases, Texts, Specifications and Objects. CEUR-WS, vol. 129, pp. 32–45 (2005)

    Google Scholar 

  2. Hejtmánek, J.: Use of Context-Dependent Units in Speech Recognition. Master thesis, University of West Bohemia in Pilsen, Faculty of Applied Sciences (2007)

    Google Scholar 

  3. Hejtmánek, J., Pavelka, T.: Use of Context-Dependent Units in Czech Speech. In: Proc. of Ph.D. Workshop 2007, Balatonfred, Hungary (2007)

    Google Scholar 

  4. Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.3), Cambridge University Engineering Department (2005)

    Google Scholar 

  5. Yu, K., Mason, J., Oglesby, J.: Speaker Recognition Models. In: Proceedings of Eurospeech 1995, pp. 629–632 (1995)

    Google Scholar 

  6. Laurinčukaité, S., Lipeika, A.: Syllable-Phoneme Based Continuous Speech Recognition. Institute of Mathematics and Informatics, Vilnius (2006)

    Google Scholar 

  7. Chang, S.: A Syllable, Articulatory-Feature and Stress-Accent Model of Speech Recognition. Berkeley. Ph.D. thesis. International Computer Science Institute (2002)

    Google Scholar 

  8. Ananthakrishnan, S., Narayanan, S.: Improved Speech Recognition Using Acoustic and Lexical Correlates of Pitch Accent in a N-best Rescoring Framework. Speech Analysis and Interpretation Laboratory Department of Electrical Engineerig Viterbi School of Engineering University of Southern California, Los Angeles (2007)

    Google Scholar 

  9. Chen, K., Hasegawa-Johnson, M., Cohen, A.: An automatic Prosody Labeling System Using ANN-based Syntactic-Prosodic Model and GMM-Based Acoustic-Prosodic Model. In: International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 509–512 (2004)

    Google Scholar 

  10. Han, Y., Boves, L.: EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition. Department of Language and Speech, Radboud University Nijmegen (2006)

    Google Scholar 

  11. Shafran, I., Ostendorf, M.: Acoustic Model Clustering Based on Syllable Structure. Washington, Department of Electrical Engineering (2002)

    Google Scholar 

  12. SIL International, Glosary of linguistic Terms (2008), http://www.sil.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hejtmánek, J. (2010). Using Syllables as Acoustic Units for Spontaneous Speech Recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2010. Lecture Notes in Computer Science(), vol 6231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15760-8_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15760-8_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15759-2

  • Online ISBN: 978-3-642-15760-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics