Abstract:
This paper addresses the problem of real-time speaker change detection in broadcast news, in which no prior knowledge on speakers is assumed. Our speaker segmentation is ...Show MoreMetadata
Abstract:
This paper addresses the problem of real-time speaker change detection in broadcast news, in which no prior knowledge on speakers is assumed. Our speaker segmentation is a "coarse to refine" process, which consists of two stages: pre-segmentation and refinement. In the pre-segmentation stage, a new approach based on Gaussian mixture model-universal background model (GMM-UBM) is proposed to categorize feature vectors into three sets, i.e. reliable speaker-related set, doubtful speaker-related set and unreliable speaker-related set, in order to enhance the effect of the reliable speaker-related feature vectors. Then potential speaker change boundaries are detected based on a novel distance measure. In the refinement stage, incremental speaker adaptation (ISA), which is suitable for real-time requirement, is proposed to obtain considerably precise speaker models so that the potential speaker change boundaries can be confirmed and refined. Experimental results demonstrate that our approach yields satisfactory performance.
Published in: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).
Date of Conference: 06-10 April 2003
Date Added to IEEE Xplore: 05 June 2003
Print ISBN:0-7803-7663-3
Print ISSN: 1520-6149