ISCA Archive Interspeech 2016
ISCA Archive Interspeech 2016

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors

Tomi Kinnunen, Alexey Sholokhov, Elie Khoury, Dennis Alexander Lehmann Thomsen, Md. Sahidullah, Zheng-Hua Tan

Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to OpenSAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors to represent the segments. Fusion of the six base SADs yields a relative decrease of 9.3% in DCF over this SAD. Further, relative decrease of 17.4% is obtained by incorporating channel detection side information.


doi: 10.21437/Interspeech.2016-1281

Cite as: Kinnunen, T., Sholokhov, A., Khoury, E., Thomsen, D.A.L., Sahidullah, M., Tan, Z.-H. (2016) HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors. Proc. Interspeech 2016, 2992-2996, doi: 10.21437/Interspeech.2016-1281

@inproceedings{kinnunen16b_interspeech,
  author={Tomi Kinnunen and Alexey Sholokhov and Elie Khoury and Dennis Alexander Lehmann Thomsen and Md. Sahidullah and Zheng-Hua Tan},
  title={{HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={2992--2996},
  doi={10.21437/Interspeech.2016-1281}
}