The 2020 Personalized Voice Trigger Challenge (PVTC2020) addresses two different research problems in a unified setup: joint wake-up word detection with speaker verification on close-talking single microphone data and far-field multi-channel microphone array data. Specially, the second task poses an additional cross-channel matching challenge on top of the far-field condition. To simulate the real-life application scenario, the enrollment utterances are recorded from close-talking cell-phone only, while the test utterances are recorded from both the close-talking cell-phone and the far-field microphone arrays. This paper introduces our challenge setup and the released database as well as the evaluation metrics. In addition, we present a sequential two stage end-to-end neural network baseline system trained with the proposed database for speaker-dependent wake-up word detection. Results show that state-of-the-art personalized voice trigger methods are still based on the two stage design, however, this benchmark database could also be used to evaluate multi-task joint learning methods. The official website, the open-source baseline system and results of submitted systems have been released.
Cite as: Jia, Y., Wang, X., Qin, X., Zhang, Y., Wang, X., Wang, J., Zhang, D., Li, M. (2021) The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results. Proc. Interspeech 2021, 4239-4243, doi: 10.21437/Interspeech.2021-602
@inproceedings{jia21b_interspeech, author={Yan Jia and Xingming Wang and Xiaoyi Qin and Yinping Zhang and Xuyang Wang and Junjie Wang and Dong Zhang and Ming Li}, title={{The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={4239--4243}, doi={10.21437/Interspeech.2021-602} }