Abstract
In this study, we propose an effective front-end technique to improve the performance of telephone speech recognition. Many works have been concentrated on compensating the noise and the channel distortions contained in telephone speech at the front-end stage of speech recognition. Based on RASTA processing which is well known for its channel robust feature parameters, we tried to further improve this method using the channel estimation power of cepstral mean subtraction and maximum likelihood method. As a hybrid method of channel estimation and RASTA processing, the proposed method was proved to be effective by experiments performed on real telephone speech data.
Preview
Unable to display preview. Download preview PDF.
References
S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust. Speech Signal Processing, Vol. ASSP-27, no. 2, pp 113–120, 1979.
P. Lockwood and J. Boudy, “Experiments with a nonlinear spectral subtractor (nss), hidden markov models and the projection, for robust speech recognition in cars,” Speech Communication, 11:215–228, 1992.
D. Mansour and B. H. Juang, “The short-time modified coherence representation and its application for noisy speech recognition,” Proc. ICASSP, pp. 525–528, 1988.
P. J. Moreno, “Speech Recognition in Telephone Environments,” MS. Thesis, Carnegie Mellon University, 1992.
C. Mokbel, J. Monne and D. Jouvet, “On-line adaptation of a speech recognizer to variations in telephone line conditions,” Proc. EUROSPEECH, pp. 1247–1250, 1993.
H. Hermansky, N. Morgan, A. Bayya and P. Kobn, “Compensation for the effect of the communication channel in Auditory-like analysis of speech (RASTA-PLP),” Proc. EUROSPEECH, pp. 1367–1370, 1991.
A. Acero, “Environmental Robustness in Automatic Speech Recognition,” Proc. ICASSP, pp. 849–852, 1990.
B. A. Hanson and T. H. Applebaum, “Subband or cepstral domain filtering for recognition of Lombard and channel-distorted speech,” Proc. ICASSP, pp. 79–82, 1993.
J. T. Chien, H. C. Wang and L. M. Lee, “Estimation of channel bias for telephone speech recognition,” Proc. ICSLP, pp. 1840–1843, 1996.
J. D. Veth and L. Boves, “Comparison of channel normalization technique for automatic speech recognition over the phone,” Proc. ICSLP, pp. 2332–2335, 1996.
C. Avendano, S. V. Vuuren and H. Hermansky, “Data Based Filter Design for RASTA-like Channel Normalization in ASR,” Proc. ICSLP, pp. 2087–2090, 1996.
J. L. Shen, W. L. Hwang and L. S. Lee, “Robust Speech Recognition Features Based on Temporal Trajectory Filtering of Frequency Band Spectrum,” Proc. ICSLP, pp. 881–884, 1996.
J. D. Veth and L. Boves, “Comparison of channel normalization technique for automatic speech recognition over the phone,” Proc. ICSLP, pp. 2332–2335, 1996.
B. A. Hanson and T. H. Applebaum, “Robust speaker-independent word recognition using static, dynamic and acceleration features: Experiments with Lombard and noisy speech,” Proc. ICASSP, pp. 857–860, 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cho, HY., Chi, SM., Oh, YH. (1998). A robust front-end for telephone speech recognition. In: Lee, HY., Motoda, H. (eds) PRICAI’98: Topics in Artificial Intelligence. PRICAI 1998. Lecture Notes in Computer Science, vol 1531. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0095307
Download citation
DOI: https://doi.org/10.1007/BFb0095307
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65271-7
Online ISBN: 978-3-540-49461-4
eBook Packages: Springer Book Archive