Abstract
In this paper we will investigate the performance of TRAP-features on clean and noisy data. Multiple feature sets are evaluated on a corpus which was recorded in clean and noisy environment. In addition, the clean version was reverberated artificially. The feature sets are assembled from selected energy bands. In this manner multiple recognizers are trained using different energy bands. The outputs of all recognizers are joined with ROVER in order to achieve a single recognition result. This system is compared to a baseline recognizer that uses Mel frequency cepstrum coefficients (MFCC). In this paper we will point out that the use of artificial reverberation leads to more robustness to noise in general. Furthermore most TRAP-based features excel in phone recognition. While MFCC features prove to be better in a matched training/test situation, TRAP-features clearly outperform them in a mismatched training/test situation: When we train on clean data and evaluate on noisy data the word accuracy (WA) can be raised by 173 % relative (from 12.0 % to 32.8 % WA).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hermansky, H., Sharma, S.: TRAPs - Classifiers of Temporal Patterns. In: Proc. ICSLP 1998, Sydney, Australia, vol. 3, pp. 1003–1006 (1998)
Fiscus, J.: A Post-processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction. In: Proc. IEEE ASRU Workshop, Santa Barbara, USA, pp. 347–352 (1997)
Speech Recognition Scoring Toolkit (SCTK). NIST Spoken Language Technology Evaluation and Utility, http://www.nist.gov/speech/tools/ (last visited 28.03.2005)
Hermansky, H.: The Modulation Spectrum in Automatic Recognition of Speech. In: IEEE Workshop on Automatic Speech Recognition and Understanding, Santa Barbara, USA (1997)
Greenberg, S., Kingsbury, B.E.: The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech. In: Proc. ICASSP 1997, Munich, Germany, pp. 1647–1650 (1997)
Couvreur, L., Couvreur, C.: On the Use of Artificial Reverberation for ASR in Highly Reverberant Environments. In: Proc. of 2nd IEEE Benelux Signal Processing Symposium, Hilvaranbeek, The Netherlands (2000)
Sony Europe. AIBO Europe - Official Website (2004), http://www.aibo-europe.com (last visited 19.12.2004)
Batliner, A., Hacker, C., Steidl, S., Nöth, E.: “You stupid tin box” - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus. In: Proc. of the 4th International Conference of Language Resources and Evaluation 2004, Lisbon, Portugal, pp. 171–174 (2004)
Stemmer, G.: Modeling Variability in Speech Recognition. PhD thesis, Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung, Germany (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maier, A., Hacker, C., Steidl, S., Nöth, E., Niemann, H. (2005). Robust Parallel Speech Recognition in Multiple Energy Bands. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds) Pattern Recognition. DAGM 2005. Lecture Notes in Computer Science, vol 3663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550518_17
Download citation
DOI: https://doi.org/10.1007/11550518_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28703-2
Online ISBN: 978-3-540-31942-9
eBook Packages: Computer ScienceComputer Science (R0)