Abstract:
In this paper, we present a frame correlation based autoregressive GMM method for voice conversion. In our system, the cross-frame correlation of the source feature is mo...Show MoreMetadata
Abstract:
In this paper, we present a frame correlation based autoregressive GMM method for voice conversion. In our system, the cross-frame correlation of the source feature is modeled with augmented delta features, and the cross-frame correlation of target feature is modeled by autoregressive models. The expectation maximization (EM) algorithm is used for the model training, and a maximum likelihood parameter conversion algorithm is then employed to convert the feature of a source speaker into the one of a target speaker frame by frame. This method is consistent in training and conversion by using target feature's cross-frame correlation explicitly at both stage. The experimental results show that the proposed method has excellent performance. The test set log probability of it is higher than the GMM-DYN (GMM with dynamic features) method, and the subjective evaluation results of it are also comparable to the GMM-DYN method. Furthermore, it is much more suitable for low-latency application.
Date of Conference: 12-14 September 2014
Date Added to IEEE Xplore: 27 October 2014
Electronic ISBN:978-1-4799-4219-0