Adjunct-Emeritus Distillation for Semi-Supervised Language Model Adaptation

Novotney, Scott; Gu, Yile; Bulyko, Ivan

doi:10.21437/Interspeech.2021-27

Adjunct-Emeritus Distillation for Semi-Supervised Language Model Adaptation

Scott Novotney, Yile Gu, Ivan Bulyko

To improve customer privacy, commercial speech applications are reducing human transcription of customer data. This has a negative impact on language model training due to a smaller amount of in-domain transcripts. Prior work demonstrated that training on automated transcripts alone provides modest gains due to reinforcement of recognition errors. We consider a new condition, where a model trained on historical human transcripts, but not the transcripts themselves, are available to us. To overcome temporal drift in vocabulary and topics, we propose a novel extension of knowledge distillation, adjunct-emeritus distillation where two imperfect teachers jointly train a student model. We conduct experiments on an English voice assistant domain and simulate a one year gap in human transcription. Unlike fine-tuning, our approach is architecture agnostic and achieves a 14% relative reduction in perplexity over the baseline approach of freezing model development and improves over the baseline of knowledge distillation.

doi: 10.21437/Interspeech.2021-27

Cite as: Novotney, S., Gu, Y., Bulyko, I. (2021) Adjunct-Emeritus Distillation for Semi-Supervised Language Model Adaptation. Proc. Interspeech 2021, 866-870, doi: 10.21437/Interspeech.2021-27

@inproceedings{novotney21_interspeech,
  author={Scott Novotney and Yile Gu and Ivan Bulyko},
  title={{Adjunct-Emeritus Distillation for Semi-Supervised Language Model Adaptation}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={866--870},
  doi={10.21437/Interspeech.2021-27}
}