Skip to main content

Phonetic Segmentation Using KALDI and Reduced Pronunciation Detection in Causal Czech Speech

  • Conference paper
  • First Online:
Book cover Text, Speech, and Dialogue (TSD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

Abstract

The paper describes the implementation of phonetic segmentation using the tools from KALDI toolkit. Its usage is motivated by the big development and support of topical techniques of ASR which are available in KALDI. The presented work is related to the research on pronunciation variability in casual Czech speech. For this purpose we use the automatic phonetic segmentation to analyze the particular phone boundaries, deletions, etc. We also present the tool for pronunciation detection. Both tools can be used for processing large databases as well as for an interactive work within the environment of Praat. Also the illustrative analysis of the segmentation accuracy and the design of new environment for phonetic segmentation in Praat are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Toledano, D.T., Hernandez Gomez, L.A., Grande, L.V.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11, 617–625 (2003)

    Article  Google Scholar 

  2. Young, S., et al.: The HTK Book, Version 3.4.1, Cambridge. (2009)

    Google Scholar 

  3. Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proc of ASRU 2011, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. (2011)

    Google Scholar 

  4. Pollak, P., Volin, J., Skarnitzl, R.: Phone segmentation tool with integrated pronunciation lexicon and Czech phonetically labelled reference database. In: Proc. of LREC 2008, 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco (2008) http://www.lrec-conf.org/proceedings/lrec2008/

  5. Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. In: Proc. of Acoustics ’08. (2008)

    Google Scholar 

  6. Goldman, J.P.: EasyAlign: an automatic phonetic alignment tool under Praat. In: Proc. of Interspeech 2011: 12th Annual Conference of the Interantional Speech Communication Association, Firenze, Italy (2011)

    Google Scholar 

  7. Bigi, B.: Sppas: a tool for the phonetic segmentation of speech. In: Proc. of LREC 2012, 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey (2012) 1748–1755

    Google Scholar 

  8. Brognaux, Sandrine, Roekhaut, Sophie, Drugman, Thomas, Beaufort, Richard: Automatic Phone Alignment. In: Isahara, Hitoshi, Kanzaki, Kyoko (eds.) JapTAL 2012. LNCS, vol. 7614, pp. 300–311. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Mohri, M., Pereira, F.C.N., Riley, M.: Speech recognition with weighted finite-state transducers. In: Handbook on Speech Processing and Speech Communication. Springer (2008) 559–584

    Google Scholar 

  10. Fousek, P., Mizera, P., Pollak, P.: CtuCopy feature extraction tool. Available at http://noel.feld.cvut.cz/speechlab/

  11. Boersma, P., Weenink, D.: Praat: Doing phonetics by computer (version 5.3.15). http://www.praat.org/ (2009)

  12. Mizera, Petr, Pollak, Petr, Kolman, Alice, Ernestus, Mirjam: Impact of Irregular Pronunciation on Phonetic Segmentation of Nijmegen Corpus of Casual Czech. In: Sojka, Petr, Horák, Aleš, Kopeček, Ivan, Pala, Karel (eds.) TSD 2014. LNCS, vol. 8655, pp. 499–506. Springer, Heidelberg (2014)

    Google Scholar 

  13. Nouza, Jan, Silovský, Jan: Adapting Lexical and Language Models for Transcription of Highly Spontaneous Spoken Czech. In: Sojka, Petr, Horák, Aleš, Kopeček, Ivan, Pala, Karel (eds.) TSD 2010. LNCS, vol. 6231, pp. 377–384. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. Kolman, A., Pollak, P.: Speech reduction in Czech. In: Proc. of LabPhone 14, The 14th Conference on Laboratory Phonology, Tokyo, Japan (2014)

    Google Scholar 

  15. Lehr, M., Gorman, K., Shafran, I.: Discriminative pronunciation modeling for dialectal speech recognition. In: Proc. of Interspeech 2014: 15th Annual Conference of the Interantional Speech Communication Association, Singapore (2014) 1458–1462

    Google Scholar 

  16. Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Proc. of Interspeech 2014: 15th Annual Conference of the Interantional Speech Communication Association, Singapore (2014)

    Google Scholar 

  17. Ernestus, M., Kočková-Amortová, L., Pollák, P.: The Nijmegen corpus of casual Czech. In: Proc. of LREC 2014: 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014) 365–370

    Google Scholar 

  18. Mizera, P., Pollák, P., Kolman, A., Ernestus, M.: Accuracy of HMM-Based Phonetic Segmentation Using Monophone or Triphone Acoustic Model. In. In Applied Electronics - 2013 International Conference on Applied Electronics, Pilsen, CR (2013) 45–48

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zdenek Patc .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Patc, Z., Mizera, P., Pollak, P. (2015). Phonetic Segmentation Using KALDI and Reduced Pronunciation Detection in Causal Czech Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics