Abstract
Various experiments have conclusively shown that superior continuous speech recognition performance is obtained when using context-dependent phonemic models. However, we have observed that using an explicit context-dependent phonemic model can yield many transcriptions for a single lexicon entry. In this work, we study the compression of the word transcription dictionaries (WTD) into a more compact form to balance the need between flexibility and reliability. Based on a measure of a likelihood function, a statistical model for an automatic procedure to compress a WTD is developed. The compressed dictionary is then used for sentence recognition in a continuous speech recognition system. Experimental results indicate a substantial improvement of the recognition rate after compression.
Chapter PDF
Keywords
- Continuous Speech
- Speaking Rate
- Continuous Speech Recognition
- Phonetic Transcription
- Sentence Recognition
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Y. Zhao. A speaker-independent continuous speech recognition system using continuous mixture gaussian density HMM of phoneme-sized units. IEEE Trans. on Speech and Audio Processing, 1(3):345–361, July 1993.
L. R. Bahl, P. F. Brown, P. V. de Souza, R. L. Mercer, and M. A. Picheny. A method or the construction of acoustic Markov models for words. IEEE Trans. on Speech and Audio Processing, 1(4):443–452, October 1993.
Y. Zhao, H. Wakita, and X. Zhuang. Generate word transcription dictionary from sentence utterances and evaluate its effects on speaker independant continuous speech recognition. In Proceedings of European Conference on Speech Technology, pages 679–682, Genova, Italy, September 1991.
F. Mouria-Beji. Context and Speed Dependent Phonetic Models for Continuous Speech Recognition. In ESCA Tutorial Proceedings of Modeling Pronunciation Variation for Automatic Speech Recognition, Kerkrade, Netherlands, May 1998.
K.F. Lee. Large-vocabulary speaker-independent continuous speech recognition: the SPHINX system. PhD thesis, Carnegie Mellon Univ., Pittsburgh, PA, April 1988.
F. Beji Mouria. Un Systéme de Reconnaissance de la Parole Continue et son Expérimentation avec un Large Vocabulaire. To appear in revue magrébine de l'ingénieur. 1998.
C.H. Lee. Acoustic modeling of subword units for speech recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pages 721–724, Albuquerque, USA, April 1990.
L.R. Bahl, P.V. de Souza, P.S. Gopalakrishnan, N. Nahamoo, and M.A. Pichney. Decision trees for phonological roles in continuous speech. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pages 185–188, Toronto, Canada, May 1991.
F. Mouria-Beji. CODEPHON-NN: A COntext-DEpendent PHONemic model based on Neural Networks. In Computational Engineering in Systems Applications multiconference, CESA'98. IEEE-SMC, April 1998.
R.M. Schwartz, Y. Chow, O.A. Kimball, S. Roucos, M. Krasner, and J. Makhoul. Contextdependent modeling for acoustic-phonetic recognition of continuous speech. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1205–1208, Tampa, Florida, March 1985.
F. Mouria-Beji, Y. Gong, and J. P. Haton. Use of explicit context-dependent phonemic model in continuous speech recognition. In Proc. European Conf. on Speech Communication and Technology, pages 2223–2226, Berlin, Germany, 1993,EUROSPEECH'93.
L.R. Bahl, P.V. de Souza, P.S. Gopalakrishnan, D. Nahamoo, and M. Picheney. Word lookahead scheme for cross-word right context models in a stack decoder. In Proceedings of European Conference on Speech Technology, pages 851–854, Berlin, 1993.
K.F. Lee. Context dependent phonetic hidden Markov models for speaker independent continuous speech recognition. IEEE Trans. on Acoust., Speech and Signal Processing, 38(4):599–609, April 1990.
F. Mouria-Beji. A Multi-Lingual Continuous Speech Recognition System. In Proc. 6th International Conference and Exhibition on Multi-lingual Computing, Cambridge, UK, April 1998. ICEMCO-98.
Y. Gong, J.-P. Haton, and F. Mouria-Beji. Continuous speech recognition based on high plausibility regions. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing 1991, volume 1, pages 725–728, Toronto, Canada, May 1991, IEEE-ICASSP'91.
F. Mouria-Beji. Neural network use in a non-linear vectorial interpolation technique for speaker recognition. In IEEE World Congress on Computational Intelligence, Anchorage, Alaska, May 1998,IEEE WCCL
F. Mouria-Beji, Y. Gong, and J. P. Haton. Un modéle phonétique tenant compte explicitement du contexte pour la reconnaissance de la parole. In Actes du 9 éme congrés Reconnaissance des Formes et Intelligence Artificielle, volume 1, pages 265–275, Paris, France, January 1994, AFCET RFIA'94.
F. Mouria-Beji and J. P. Haton. Utilisation des réseaux de neurones pour le traitement de la variabilité du signal de parole. In The 15th Tunisian Conference on Electrical Machinery and Automatic Controle, volume 1, pages 122–130, Nabeul, November. 1995. JTEA'95.
S. Takahashi, T. Matsuoka, Y. Minami, and K. Shikano. Phoneme HMMS constrained by frame correlations. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, volume II, pages 219–222, 1993.
M. Ostendorf and S. Roucos. A stochastic segment model for phoneme-based continuous speech recognition. IEEE Trans. Acoust., Speech and Signal Processing, 37(12):1857–1869,1989.
V. V. Digalakis, M. Ostendorf, and J. R. Rohlicek. Fast algorithms for phone classification and recognition using segment-based models. IEEE Trans. on Signal Processing, 40(12):2885–2896, Dec. 1992.
S. Furui. Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust., Speech and Signal Processing, ASSP-34(1):53–59, 1986.
Wellekens. Explicite time correlation in hidden Markov models for speech recognition. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, pages 384–386, Dallas, 1987.
K. K. PaliwaL Use of temporal correlation between successive frames in a hidden Markov model based speech recognizes. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, volume II, pages 215–218, 1993.
T. Robinson. A real-time recurrent error propagation network word recognition system. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, volume I, pages 617–620, 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mouria-Beji, F. (1998). A statistical model for an automatic procedure to compress a word transcription dictionary. In: Amin, A., Dori, D., Pudil, P., Freeman, H. (eds) Advances in Pattern Recognition. SSPR /SPR 1998. Lecture Notes in Computer Science, vol 1451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0033335
Download citation
DOI: https://doi.org/10.1007/BFb0033335
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64858-1
Online ISBN: 978-3-540-68526-5
eBook Packages: Springer Book Archive