ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition

Jiajun Deng, Fabian Ritter Gutierrez, Shoukang Hu, Mengzhe Geng, Xurong Xie, Zi Ye, Shansong Liu, Jianwei Yu, Xunying Liu, Helen Meng

Automatic recognition of elderly and disordered speech remains a highly challenging task to date. Such data is not only difficult to collect in large quantities, but also exhibits a significant mismatch against normal speech trained ASR systems. To this end, conventional deep neural network model adaptation approaches only consider parameter fine-tuning on limited target domain data. In this paper, a novel Bayesian parametric and neural architectural domain adaptation approach is proposed. Both the standard model parameters and architectural hyper-parameters (hidden layer L/R context offsets) of two lattice-free MMI (LF-MMI) factored TDNN systems separately trained using large quantities of normal speech from the English LibriSpeech and Cantonese SpeechOcean corpora were domain adapted to two tasks: a) 16-hour DementiaBank elderly speech corpus; and b) 14-hour CUDYS dysarthric speech database. A Bayesian differentiable architectural search (DARTS) super-network was designed to allow both efficient search over up to 728 different TDNN structures during domain adaptation, and robust modelling of parameter uncertainty given limited target domain data. Absolute recognition error rate reductions of 1.82% and 2.93% (13.2% and 8.3% relative) were obtained over the baseline systems performing model parameter fine-tuning only. Consistent performance improvements were retained after data augmentation and learning hidden unit contribution (LHUC) based speaker adaptation was performed.


doi: 10.21437/Interspeech.2021-289

Cite as: Deng, J., Gutierrez, F.R., Hu, S., Geng, M., Xie, X., Ye, Z., Liu, S., Yu, J., Liu, X., Meng, H. (2021) Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition. Proc. Interspeech 2021, 4818-4822, doi: 10.21437/Interspeech.2021-289

@inproceedings{deng21d_interspeech,
  author={Jiajun Deng and Fabian Ritter Gutierrez and Shoukang Hu and Mengzhe Geng and Xurong Xie and Zi Ye and Shansong Liu and Jianwei Yu and Xunying Liu and Helen Meng},
  title={{Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={4818--4822},
  doi={10.21437/Interspeech.2021-289}
}