Abstract:
This paper studies a new technique that characterizes a speaker by the difference between the speaker and a cohort of background speakers in the form of feature-space max...Show MoreMetadata
Abstract:
This paper studies a new technique that characterizes a speaker by the difference between the speaker and a cohort of background speakers in the form of feature-space maximum a posteriori linear regression (fMAPLR). The fMAPLR is a linear regression function that projects speaker dependent features to speaker independent ones, also known as an affine transform. It consists of two sets of parameters, bias vectors and transform matrices. The former, representing the first order information, is more robust than the latter, the second-order information. We propose a flexible tying scheme that allows the bias vectors and the matrices to be associated with different regression classes, such that both parameters are given sufficient statistics in a speaker verification task. We formulate a maximum a posteriori (MAP) algorithm for the estimation of feature transform parameters, that further alleviates the possible numerical problem. The fMAPLR parameters are then vectorized and compared via a support vector machine (SVM). We conduct the experiments on National Institute of Standards and Technology (NIST) 2006 and 2008 Speaker Recognition Evaluation databases. The experiments show that the proposed technique consistently outperforms the baseline Gaussian mixture model (GMM)-SVM speaker verification system.
Published in: IEEE Transactions on Audio, Speech, and Language Processing ( Volume: 19, Issue: 3, March 2011)