ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Overlapped Speech Detection Based on Spectral and Spatial Feature Fusion

Weiguang Chen, Van Tung Pham, Eng Siong Chng, Xionghu Zhong

Overlapped speech is widely present in conversations and can cause significant performance degradation on speech processing such as diarization, enhancement, and recognition. Detection of overlapped speech, in particular when the speakers are in the far-field, is a challenging task as the overlapped part is usually short, and heavy reverberation and noise may present in the conversation scenario. Existing solutions overwhelmingly rely on spectral features extracted from single microphone signal to perform the detection. In this paper, we propose a novel detection approach which is able to use a microphone array and fuse the spatial and spectral features extracted from multi-channel array signal. Two categories of spatial features, directional statistics which are projected to spherical location grids and generalized cross-correlation function based on phase transform (GCC-PHAT), are considered to model the speaker’s spatial characteristic. Such spatial features are then fused with the spectral features to detect the overlapped speech by using a Gated Multimodal Unit (GMU). The performance of the proposed approach is studied under AMI and CHiME-6 corpora. Experimental results show that the proposed feature fusion approach achieves better performance than methods using spectral features only.


doi: 10.21437/Interspeech.2021-2138

Cite as: Chen, W., Pham, V.T., Chng, E.S., Zhong, X. (2021) Overlapped Speech Detection Based on Spectral and Spatial Feature Fusion. Proc. Interspeech 2021, 4189-4193, doi: 10.21437/Interspeech.2021-2138

@inproceedings{chen21t_interspeech,
  author={Weiguang Chen and Van Tung Pham and Eng Siong Chng and Xionghu Zhong},
  title={{Overlapped Speech Detection Based on Spectral and Spatial Feature Fusion}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={4189--4193},
  doi={10.21437/Interspeech.2021-2138}
}