Phonetic Segmentation Using KALDI and Reduced Pronunciation Detection in Causal Czech Speech

Patc, Zdenek; Mizera, Petr; Pollak, Petr

doi:10.1007/978-3-319-24033-6_49

Zdenek Patc¹⁵,
Petr Mizera¹⁵ &
Petr Pollak¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1921 Accesses
3 Citations

Abstract

The paper describes the implementation of phonetic segmentation using the tools from KALDI toolkit. Its usage is motivated by the big development and support of topical techniques of ASR which are available in KALDI. The presented work is related to the research on pronunciation variability in casual Czech speech. For this purpose we use the automatic phonetic segmentation to analyze the particular phone boundaries, deletions, etc. We also present the tool for pronunciation detection. Both tools can be used for processing large databases as well as for an interactive work within the environment of Praat. Also the illustrative analysis of the segmentation accuracy and the design of new environment for phonetic segmentation in Praat are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automatic Phonetic Segmentation and Pronunciation Detection with Various Approaches of Acoustic Modeling

Automatic Phonetic Segmentation Using the Kaldi Toolkit

Resources and Tools for Automated Speech Segmentation of the African Language Naija (Nigerian Pidgin)

References

Toledano, D.T., Hernandez Gomez, L.A., Grande, L.V.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11, 617–625 (2003)
Article Google Scholar
Young, S., et al.: The HTK Book, Version 3.4.1, Cambridge. (2009)
Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proc of ASRU 2011, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. (2011)
Google Scholar
Pollak, P., Volin, J., Skarnitzl, R.: Phone segmentation tool with integrated pronunciation lexicon and Czech phonetically labelled reference database. In: Proc. of LREC 2008, 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco (2008) http://www.lrec-conf.org/proceedings/lrec2008/
Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. In: Proc. of Acoustics ’08. (2008)
Google Scholar
Goldman, J.P.: EasyAlign: an automatic phonetic alignment tool under Praat. In: Proc. of Interspeech 2011: 12th Annual Conference of the Interantional Speech Communication Association, Firenze, Italy (2011)
Google Scholar
Bigi, B.: Sppas: a tool for the phonetic segmentation of speech. In: Proc. of LREC 2012, 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey (2012) 1748–1755
Google Scholar
Brognaux, Sandrine, Roekhaut, Sophie, Drugman, Thomas, Beaufort, Richard: Automatic Phone Alignment. In: Isahara, Hitoshi, Kanzaki, Kyoko (eds.) JapTAL 2012. LNCS, vol. 7614, pp. 300–311. Springer, Heidelberg (2012)
Chapter Google Scholar
Mohri, M., Pereira, F.C.N., Riley, M.: Speech recognition with weighted finite-state transducers. In: Handbook on Speech Processing and Speech Communication. Springer (2008) 559–584
Google Scholar
Fousek, P., Mizera, P., Pollak, P.: CtuCopy feature extraction tool. Available at http://noel.feld.cvut.cz/speechlab/
Boersma, P., Weenink, D.: Praat: Doing phonetics by computer (version 5.3.15). http://www.praat.org/ (2009)
Mizera, Petr, Pollak, Petr, Kolman, Alice, Ernestus, Mirjam: Impact of Irregular Pronunciation on Phonetic Segmentation of Nijmegen Corpus of Casual Czech. In: Sojka, Petr, Horák, Aleš, Kopeček, Ivan, Pala, Karel (eds.) TSD 2014. LNCS, vol. 8655, pp. 499–506. Springer, Heidelberg (2014)
Google Scholar
Nouza, Jan, Silovský, Jan: Adapting Lexical and Language Models for Transcription of Highly Spontaneous Spoken Czech. In: Sojka, Petr, Horák, Aleš, Kopeček, Ivan, Pala, Karel (eds.) TSD 2010. LNCS, vol. 6231, pp. 377–384. Springer, Heidelberg (2010)
Chapter Google Scholar
Kolman, A., Pollak, P.: Speech reduction in Czech. In: Proc. of LabPhone 14, The 14th Conference on Laboratory Phonology, Tokyo, Japan (2014)
Google Scholar
Lehr, M., Gorman, K., Shafran, I.: Discriminative pronunciation modeling for dialectal speech recognition. In: Proc. of Interspeech 2014: 15th Annual Conference of the Interantional Speech Communication Association, Singapore (2014) 1458–1462
Google Scholar
Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Proc. of Interspeech 2014: 15th Annual Conference of the Interantional Speech Communication Association, Singapore (2014)
Google Scholar
Ernestus, M., Kočková-Amortová, L., Pollák, P.: The Nijmegen corpus of casual Czech. In: Proc. of LREC 2014: 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014) 365–370
Google Scholar
Mizera, P., Pollák, P., Kolman, A., Ernestus, M.: Accuracy of HMM-Based Phonetic Segmentation Using Monophone or Triphone Acoustic Model. In. In Applied Electronics - 2013 International Conference on Applied Electronics, Pilsen, CR (2013) 45–48
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
Zdenek Patc, Petr Mizera & Petr Pollak

Authors

Zdenek Patc
View author publications
You can also search for this author in PubMed Google Scholar
Petr Mizera
View author publications
You can also search for this author in PubMed Google Scholar
Petr Pollak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zdenek Patc .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patc, Z., Mizera, P., Pollak, P. (2015). Phonetic Segmentation Using KALDI and Reduced Pronunciation Detection in Causal Czech Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_49
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics