Abstract
Speech technology enables computing statistics on word pronunciation variants as well as investigating various phonetic phenomena. This is achieved through a forced alignment of large amounts of speech signals with their possible pronunciations variants. Such alignments are usually performed using a 10 ms frame shift acoustical analysis. Therefore, the three emitting state structure of conventional acoustic hidden Markov models introduces a minimum duration constraint of 30 ms for each phone segment. This constraint is not critical at low speaking rates, but may introduce artefacts at high speaking rates. Thus, this paper investigates the impact of the acoustical frame rate on corpus-based phonetic statistics. Statistics on pronunciation variants obtained with a shorter frame shift (5 ms) are compared to the statistics resulting from the standard 10 ms frame shift. Statistics are computed on a large speech corpus of more than 3 million running words, and are analyzed with respect to the estimated local speaking rate. Results exhibit some discrepancies between the two sets of statistics, in particular for high speaking rates where the usual acoustic analysis frame shift of 10 ms leads to an under-estimation of the frequency of the longest pronunciation variants.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adda-Decker, M.: De la reconnaissance automatique de la parole à l’ analyse linguistique de corpus oraux. In: Proceedings of JEP’ 2006, XXVIes Journées dEtude sur la Parole, Dinard, France, pp. 389–400 (2006)
Adda-Decker, M., Lamel, L.: Systèmes d’ alignement automatique et études de variantes de prononciation. In: Proceedings of JEP’ 2000, XXIIIes Journées d’ Etudes sur la Parole 19–23 juin 2000, Aussois, France, pp. 189–192 (2000)
Adell, J., Bonafonte, A., Gómez, J.A., Castro, M.J.: Comparative study of automatic phone segmentation methods for TTS. In: Proceedings of ICASSP’ 2005, IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, USA, pp. 309–312 (2005)
Bigi, B., Hirst, D.: SPeech Phonetization Alignment and Syllabification (SPPAS): a tool for the automatic analysis of speech prosody. In: Proceedings of Speech Prosody, Shanghai, China, pp. 1–4 (2012)
Bürki, A., Gendrot, C., Gravier, G., Linarès, G., Fougeron, C.: Alignement automatique et analyse phonétique: comparaison de différents systèmes pour l’ analyse du schwa. Traitement Automatique des Langues 49(3), 165–197 (2008)
Côté, M.-H.: Phonetic salience and consonant cluster simplification. In: Bruening, B., Kang, Y., McGinnis, M. (eds.) PF: Papers at the Interface. MIT Working Papers in Linguistics, vol. 29, pp. 229–262 (1997)
de Calmès, M., Pérennou, G.: BDLEX: a Lexicon for spoken and written French. In: Proceedings of LREC’ 1998, 1st International Conference on Language Resources and Evaluation, Grenada, Spain, pp. 1129–1136, 28–30 May 1998
De Mareüil, P.B., Adda-Decker, M., Gendner, V.: Liaisons in French: a corpus-based study using morpho-syntactic information. In: Proceedings of ICPhS’ 2003, 15th International Congress of Phonetic Sciences, Barcelona, Spain (2003)
Demuynck, K., Laureys, T.: A comparison of different approaches to automatic speech segmentation. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 277–284. Springer, Heidelberg (2002)
EPAC Corpus: Orthographic transcriptions. ELRA catalogue (http://catalog.elra.info), ref. ELRA-S0305
Estève, Y., Bazillon, T., Antoine, J.-Y., Béchet, F., Farinas, J.: The EPAC corpus: manual and automatic annotations of conversational speech in French broadcast news. In: Proceedings of LREC’ 2010, Seventh International Conference on Language Resources and Evaluation, Valetta, Malta, 19–21 May 2010
Galliano, S., Gravier, G., Chaubard, L.: The ESTER 2 evaluation campaign for rich transcription of French broadcasts. In: Proceedings of INTERSPEECH’ 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, UK, pp. 2583–2586, 6–10 September (2009)
Goldman, J.P.: EasyAlign: an automatic phonetic alignment tool under Praat (2011)
Gravier, G., Adda, G., Paulsson, N., Carré, M., Giraudel, A., Galibert, O.: The ETAPE corpus for the evaluation of speech-based TV content processing in the French language. In: Proceedings of LREC’ 2012, 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 23–25 May 2012
Huggins-Daines, D., Rudnicky, A.I.: A Constrained Baum-Welch Algorithm for Improved Phoneme Segmentation and Efcient Training. CMU report (2006)
Illina, I., Fohr, D., Jouvet, D.: Grapheme-to-phoneme conversion using conditional random fields. In: Proceedings of INTERSPEECH’ 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 27–31 August 2011
Jarifi, S., Pastor, D., Rosec, O.: A fusion approach for automatic speech segmentation of large corpora with application to speech synthesis. Speech Commun. 50(1), 67–80 (2008)
Jouvet, D., Fohr, D., Illina, I.: Detailed pronunciation variant modeling for speech transcription. In: Proceedings of INTERSPEECH’ 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Japan, 26–30 September 2010
Kawai, H., Toda, T.: An evaluation of automatic phone segmentation for concatenative speech synthesis. In: Proceedings of ICASSP’ 2004, IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, CA, vol. I, pp. 677–680 (2004)
Kessens, J.M., Strik, H.: On automatic phonetic transcription quality: lower word error rates do not guarantee better transcriptions. Comput. Speech Lang. 18(2), 123–141 (2004)
Kominek, J., and Black, A.W.: A family-of-models approach to HMM-based segmentation for unit selection speech synthesis. In: Proceedings of INTERSPEECH’ 2004, 8th International Conference on Spoken Language Processing, Jeju Island, Korea, 4–8 October 2004
Kuperman, V., Pluymaekers, M., Ernestus, M., Baayen, H.: Morphological predictability and acoustic duration of interfixes in Dutch compounds. J. Acous. Soc. Am. 121(4), 2261–2271 (2007)
Mporas, I., Ganchev, T., Fakotakis, N.: Speech segmentation using regression fusion of boundary predictions. Comput. Speech Lang. 24(2), 273–288 (2010)
Nakamura, M., Iwano, K., Furui, S.: Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Comput. Speech Lang. 22(2), 171–184 (2008)
Ogbureke, K.U., Carson-Berndsen, J.: Improving initial boundary estimation for HMM-based automatic phonetic segmentation. In: Proceedings of INTERSPEECH’ 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, UK, pp. 884–887, 6–10 September 2009
Raymond, W.D., Pitt, M.A., Johnson, K., Hume, E., Makashay, M.J., Dautricourt, R., Hilts, C.: An analysis of transcription consistency in spontaneous speech from the buckeye corpus. In: Proceedings of INTERSPEECH’ 2002, 7th International Conference on Spoken Language Processing, Denver, Colorado, USA, 16–20 September 2002
Sjlander, K.: An HMM-based system for automatic segmentation and alignment of speech. In: Proceedings of Fonetik, Lvnger, Sweden, pp. 93–96 (2003)
Sphinx (2011). http://cmusphinx.sourceforge.net/
Toledano, D.T., Gómez, L.A.H., Grande, L.V.: Automatic phonetic segmentation. IEEE Trans. Speech Audio Process. 11(6), 617–625 (2003)
Acknowledgments
This work has been partly realized thanks to the support of the Région Lorraine and the CPER MISN TALC project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Jouvet, D., Bartkova, K. (2015). Acoustical Frame Rate and Pronunciation Variant Statistics. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-25789-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)