ISCA Archive Interspeech 2016
ISCA Archive Interspeech 2016

Model Adaptation and Active Learning in the BBN Speech Activity Detection System for the DARPA RATS Program

Damianos Karakos, Scott Novotney, Le Zhang, Richard Schwartz

Model adaptation is an important task in many human language technology fields, as it allows one to reduce differences that arise due to various forms of variability. Here, we focus on the speech activity detection (SAD) task, in the context of the DARPA RATS program, where the training data do not cover all channels (transmitter/receiver characteristics) that are encountered at test time. For supervised adaptation, limited manually labeled data from the (novel) channel of interest are used to adapt the model; for unsupervised adaptation, the labels are automatically generated with a baseline model. The modeling is done with long short-term memory neural networks, and we make the case that strong regularization is of paramount importance when adapting such models. Results on two different datasets show that adaptation gives rise to large gains (at least 27related task, that of active learning, is also considered. In active learning, data to be annotated for supervised adaptation are selected automatically, with the ultimate goal of maximizing performance. We investigate an algorithm for active learning that utilizes the output of a SAD decoder and show that it performs significantly better (by 10% relative) than random selection.


doi: 10.21437/Interspeech.2016-603

Cite as: Karakos, D., Novotney, S., Zhang, L., Schwartz, R. (2016) Model Adaptation and Active Learning in the BBN Speech Activity Detection System for the DARPA RATS Program. Proc. Interspeech 2016, 3678-3682, doi: 10.21437/Interspeech.2016-603

@inproceedings{karakos16_interspeech,
  author={Damianos Karakos and Scott Novotney and Le Zhang and Richard Schwartz},
  title={{Model Adaptation and Active Learning in the BBN Speech Activity Detection System for the DARPA RATS Program}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={3678--3682},
  doi={10.21437/Interspeech.2016-603}
}