BIT Submission for the Conversational Speaker Diarization Challenge

Hu, Chenguang; Zhan, Qingran; Liu, Miao; Xie, Xiang

doi:10.21437/Odyssey.2022-21

BIT Submission for the Conversational Speaker Diarization Challenge

Chenguang Hu, Qingran Zhan, Miao Liu, Xiang Xie

This paper describes the BIT(Beijing Institute of Technology) system submitted to the Conversational Speaker Diarization Challenge. We firstly present the details of the front-end system, which comprises a Speech Activity Detection (SAD) module and a speaker embedding extraction module. Then based on the results of the clustering-based module, two iterative back-end models with multi-scale similarity measure are investigated: Support Vector Classifier (SVC) system and U-Net system. Finally, DOVER algorithm is adopted for model fusion. Experimental results show that our system yields a DER of 5.18% in the challenge, a relative improvement of 34% over the baseline system provided by the organizer. Our system won the first place among all submitted systems without needing to use any of additional embedding extracting model.

doi: 10.21437/Odyssey.2022-21

Cite as: Hu, C., Zhan, Q., Liu, M., Xie, X. (2022) BIT Submission for the Conversational Speaker Diarization Challenge. Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 148-155, doi: 10.21437/Odyssey.2022-21

@inproceedings{hu22_odyssey,
  author={Chenguang Hu and Qingran Zhan and Miao Liu and Xiang Xie},
  title={{BIT Submission for the Conversational Speaker Diarization Challenge}},
  year=2022,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2022)},
  pages={148--155},
  doi={10.21437/Odyssey.2022-21}
}