Abstract:
Speaker diarization is the task of estimating “who spoke when” in a meeting. To realize accurate diarization for real meetings, we have to deal with noise, speaker overla...Show MoreMetadata
Abstract:
Speaker diarization is the task of estimating “who spoke when” in a meeting. To realize accurate diarization for real meetings, we have to deal with noise, speaker overlap, reverberation, etc. In this work, we propose to model directional statistics of spatial clusters via a dictionary of probabilistic models. The dictionary is trained using spatial features of possible source locations. Observed mixtures of multiple source signals are statistically represented as the weighted sum of the trained models, where each weight defines the activity of a source associated with a spatial location or a cluster. To detect the active clusters and perform the speaker diarization, the weights are estimated by applying Bayes' rule. Furthermore, a Laplace distribution is proposed to model the background noise. The proposed method was evaluated in real meetings, and it provided high performance comparing to a baseline method.
Date of Conference: 13-16 September 2016
Date Added to IEEE Xplore: 24 October 2016
ISBN Information: