Using data from multiple dialects has shown promise in improving neural network acoustic models. While such training can improve the performance of an acoustic model on a single dialect, it can also produce a model capable of good performance on multiple dialects. However, training an acoustic model on pooled data from multiple dialects takes a significant amount of time and computing resources, and it needs to be retrained every time a new dialect is added to the model. In contrast, sequential transfer learning (fine-tuning) does not require retraining using all data, but may result in catastrophic forgetting of previously-seen dialects. Using data from four english dialects, we demonstrate that by using loss functions that mitigate catastrophic forgetting, sequential transfer learning can be used to train multi-dialect acoustic models that narrow the WER gap between the best (combined training) and worst (fine-tuning) case by up to 65%. Continual learning shows great promise in minimizing training time while approaching the performance of models that require much more training time.
Cite as: Houston, B., Kirchhoff, K. (2020) Continual Learning for Multi-Dialect Acoustic Models. Proc. Interspeech 2020, 576-580, doi: 10.21437/Interspeech.2020-1797
@inproceedings{houston20_interspeech, author={Brady Houston and Katrin Kirchhoff}, title={{Continual Learning for Multi-Dialect Acoustic Models}}, year=2020, booktitle={Proc. Interspeech 2020}, pages={576--580}, doi={10.21437/Interspeech.2020-1797} }