Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17

Mccree, Alan; Snyder, David; Sell, Greg; Garcia-Romero, Daniel

doi:10.21437/Odyssey.2018-10

Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17

Alan Mccree, David Snyder, Greg Sell, Daniel Garcia-Romero

This paper presents our newest language recognition systems developed for NIST LRE17. For this challenging limited data multidomain task, we were able to get very good performance with our state-of-the-art DNN senone and bottleneck joint i-vector systems by effective utilization of all of the available training and development data. Data augmentation techniques were very valuable for this task, and our discriminative Gaussian classifier combined with naive fusion used all of the development data for system design rather than holding some out for separate back-end training. Finally, our newest research with discriminatively-trained DNN embeddings allowed us to replace i-vectors with more powerful x-vectors to further improve language recognition accuracy, resulting in very good LRE17 performance for this single system, our JHU HLTCOE site fusion primary submission, and the JHU MIT team submission.

doi: 10.21437/Odyssey.2018-10

Cite as: Mccree, A., Snyder, D., Sell, G., Garcia-Romero, D. (2018) Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17 . Proc. The Speaker and Language Recognition Workshop (Odyssey 2018), 68-73, doi: 10.21437/Odyssey.2018-10

@inproceedings{mccree18_odyssey,
  author={Alan Mccree and David Snyder and Greg Sell and Daniel Garcia-Romero},
  title={{Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17	}},
  year=2018,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2018)},
  pages={68--73},
  doi={10.21437/Odyssey.2018-10}
}