Authors:
Jiri Malek
;
Petr Cerva
;
Ladislav Seps
and
Jan Nouza
Affiliation:
Technical University of Liberec, Czech Republic
Keyword(s):
Deep Neural Networks, Bottleneck Features, Real-world Nonlinear Distortion, Robust Speech Recognition.
Related
Ontology
Subjects/Areas/Topics:
Design and Implementation of Signal Processing Systems
;
Multimedia
;
Multimedia Signal Processing
;
Multimedia Systems and Applications
;
Neural Networks, Spiking Systems, Genetic Algorithms and Fuzzy Logic
;
Telecommunications
Abstract:
This paper focuses on the robust recognition of nonlinearly distorted speech. We have reported (Seps et al.,
2014) that hybrid acoustic models based on a combination of Hidden Markov Models and Deep Neural Networks
(HMM-DNNs) are better suited to this task than conventional HMMs utilizing Gaussian Mixture Models
(HMM-GMMs). To further improve recognition accuracy, this paper investigates the possibility of combining
the modeling power of deep neural networks with the adaptation to given acoustic conditions. For this
purpose, the deep neural networks are utilized to produce bottleneck coefficients / features (BNC). The BNCs
are subsequently used for training of HMM-GMM based acoustic models and then adapted using Constrained
Maximum Likelihood Linear Regression (CMLLR). Our results obtained for three types of nonlinear distortions
and three types of input features show that the adapted BNC-based system (a) outperforms HMM-DNN
acoustic models in the case of strong compression and (b) y
ields comparable performance for speech affected
by nonlinear amplification in the analog domain.
(More)