ABSTRACT
This paper studies the improvement of speech recognition over Bluetooth™ wireless channels. Speech recognition over Bluetooth™ suffers from the low SNR due to the position of the Bluetooth™ microphone, Bluetooth™ codec distortion, packet loss over the wireless channel, and Bluetooth™ channel distortion. By transforming the MFCCs (Mel-Frequency Cepstral Coefficients) to make the cumulative density functions of the MFCC values in recognition match the ones that were estimated on the training data, the recognition can be improved. The cumulative density functions are approximated using a small number of quantiles. Recognition tests on a Bluetooth™ speech database showed significant increase of recognition accuracy in noisy environments.
- Bawab, Z. A., et al. Speech recognition over Bluetooth wireless channels. In Proceedings of Eurospeech. Geneva, Switzerland, 2003, 1233--1236.Google Scholar
- Bluetooth#8482; Specification Version 1.2, Nov. 2003.Google Scholar
- Higler, F. Quantile Based Histogram Equalization for Noise Robust Speech Recognition. Ph. D. Dissertation, RWTH Aachen (University of Technology), Aachen, Germany, 2005.Google Scholar
- Hilger, F., and Ney, H. Quantile Based Histogram Equalization for Noise Robust Large Vocabulary Speech Recognition. IEEE Transactions on Speech and Audio Processing, Vol. 14, No. 3 (May 2006), 845--854. Google ScholarDigital Library
- Molau, S., Pitz, M., and Ney, H. Histogram based normalization in the acoustic feature space. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding. Madonna di Campiglio, Trento, Italy, Dec. 2001.Google ScholarCross Ref
- Nour-Eldin, A. H., et al. Automatic recognition of Bluetooth speech in 802.11 interference and the effectiveness of insertion-based compensation techniques. In Proceedings of ICASSP. Montreal, Quebec, Canada, 2004, 1033--1036.Google Scholar
Index Terms
- Linear histogram equalization in the acoustic feature domain for speech recognition over Bluetooth™ channels
Recommendations
Environmental robust speech and speaker recognition through multi-channel histogram equalization
Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automaticspeech and speaker recognition in noisy acoustic scenarios: feature coefficients are normalized by using suitable linear or nonlinear ...
Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition
In this paper, pronunciation variability between native and non-native speakers is investigated, and a novel acoustic model adaptation method is proposed based on pronunciation variability analysis in order to improve the performance of a speech ...
Slovenian spontaneous speech recognition and acoustic modeling of filled pauses and onomatopoeas
This paper is focused on acoustic modeling for spontaneous speech recognition. This topic is still a very challenging task for speech technology research community. The attributes of spontaneous speech can heavily degrade speech recognizer's accuracy ...
Comments