Microphone distance adaptation is an important and challenging problem for far field speech recognition using a single distant microphone. This paper investigates the use of Cluster Adaptive Training (CAT) to learn a structured Deep Neural Network (DNN) that can be quickly adapted to cope with changes in the distance between the microphone and speaker at test time. A speech corpus was created by re-recording the Wall Street Journal (WSJ0) audio using far-field microphones with 8 different distances from the source. Experimental results show that unsupervised adaptation of the CAT-DNN model achieved up to 0.9% absolute word error rate reduction compared to the canonical model trained on multi-style data.
Cite as: Prasad, A., Sim, K.C. (2016) Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech Recognition. Proc. Interspeech 2016, 3823-3827, doi: 10.21437/Interspeech.2016-738
@inproceedings{prasad16_interspeech, author={Animesh Prasad and Khe Chai Sim}, title={{Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech Recognition}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={3823--3827}, doi={10.21437/Interspeech.2016-738} }