Abstract
The paper describes the implementation of phonetic segmentation using the tools from KALDI toolkit. Its usage is motivated by the big development and support of topical techniques of ASR which are available in KALDI. The presented work is related to the research on pronunciation variability in casual Czech speech. For this purpose we use the automatic phonetic segmentation to analyze the particular phone boundaries, deletions, etc. We also present the tool for pronunciation detection. Both tools can be used for processing large databases as well as for an interactive work within the environment of Praat. Also the illustrative analysis of the segmentation accuracy and the design of new environment for phonetic segmentation in Praat are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Toledano, D.T., Hernandez Gomez, L.A., Grande, L.V.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11, 617–625 (2003)
Young, S., et al.: The HTK Book, Version 3.4.1, Cambridge. (2009)
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proc of ASRU 2011, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. (2011)
Pollak, P., Volin, J., Skarnitzl, R.: Phone segmentation tool with integrated pronunciation lexicon and Czech phonetically labelled reference database. In: Proc. of LREC 2008, 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco (2008) http://www.lrec-conf.org/proceedings/lrec2008/
Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. In: Proc. of Acoustics ’08. (2008)
Goldman, J.P.: EasyAlign: an automatic phonetic alignment tool under Praat. In: Proc. of Interspeech 2011: 12th Annual Conference of the Interantional Speech Communication Association, Firenze, Italy (2011)
Bigi, B.: Sppas: a tool for the phonetic segmentation of speech. In: Proc. of LREC 2012, 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey (2012) 1748–1755
Brognaux, Sandrine, Roekhaut, Sophie, Drugman, Thomas, Beaufort, Richard: Automatic Phone Alignment. In: Isahara, Hitoshi, Kanzaki, Kyoko (eds.) JapTAL 2012. LNCS, vol. 7614, pp. 300–311. Springer, Heidelberg (2012)
Mohri, M., Pereira, F.C.N., Riley, M.: Speech recognition with weighted finite-state transducers. In: Handbook on Speech Processing and Speech Communication. Springer (2008) 559–584
Fousek, P., Mizera, P., Pollak, P.: CtuCopy feature extraction tool. Available at http://noel.feld.cvut.cz/speechlab/
Boersma, P., Weenink, D.: Praat: Doing phonetics by computer (version 5.3.15). http://www.praat.org/ (2009)
Mizera, Petr, Pollak, Petr, Kolman, Alice, Ernestus, Mirjam: Impact of Irregular Pronunciation on Phonetic Segmentation of Nijmegen Corpus of Casual Czech. In: Sojka, Petr, Horák, Aleš, Kopeček, Ivan, Pala, Karel (eds.) TSD 2014. LNCS, vol. 8655, pp. 499–506. Springer, Heidelberg (2014)
Nouza, Jan, Silovský, Jan: Adapting Lexical and Language Models for Transcription of Highly Spontaneous Spoken Czech. In: Sojka, Petr, Horák, Aleš, Kopeček, Ivan, Pala, Karel (eds.) TSD 2010. LNCS, vol. 6231, pp. 377–384. Springer, Heidelberg (2010)
Kolman, A., Pollak, P.: Speech reduction in Czech. In: Proc. of LabPhone 14, The 14th Conference on Laboratory Phonology, Tokyo, Japan (2014)
Lehr, M., Gorman, K., Shafran, I.: Discriminative pronunciation modeling for dialectal speech recognition. In: Proc. of Interspeech 2014: 15th Annual Conference of the Interantional Speech Communication Association, Singapore (2014) 1458–1462
Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Proc. of Interspeech 2014: 15th Annual Conference of the Interantional Speech Communication Association, Singapore (2014)
Ernestus, M., Kočková-Amortová, L., Pollák, P.: The Nijmegen corpus of casual Czech. In: Proc. of LREC 2014: 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014) 365–370
Mizera, P., Pollák, P., Kolman, A., Ernestus, M.: Accuracy of HMM-Based Phonetic Segmentation Using Monophone or Triphone Acoustic Model. In. In Applied Electronics - 2013 International Conference on Applied Electronics, Pilsen, CR (2013) 45–48
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Patc, Z., Mizera, P., Pollak, P. (2015). Phonetic Segmentation Using KALDI and Reduced Pronunciation Detection in Causal Czech Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-24033-6_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)