research-article

Identification of Reconstructed Speech

Authors:

Haojun Wu,

Yong Wang,

Jiwu HuangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 13, Issue 1

Article No.: 10, Pages 1 - 20

https://doi.org/10.1145/3004055

Published: 17 January 2017 Publication History

Get Access

Abstract

Both voice conversion and hidden Markov model-- (HMM) based speech synthesis can be used to produce artificial voices of a target speaker. They have shown great negative impacts on speaker verification (SV) systems. In order to enhance the security of SV systems, the techniques to detect converted/synthesized speech should be taken into consideration. During voice conversion and HMM-based synthesis, speech reconstruction is applied to transform a set of acoustic parameters to reconstructed speech. Hence, the identification of reconstructed speech can be used to distinguish converted/synthesized speech from human speech. Several related works on such identification have been reported. The equal error rates (EERs) lower than 5% of detecting reconstructed speech have been achieved. However, through the cross-database evaluations on different speech databases, we find that the EERs of several testing cases are higher than 10%. The robustness of detection algorithms to different speech databases needs to be improved. In this article, we propose an algorithm to identify the reconstructed speech. Three different speech databases and two different reconstruction methods are considered in our work, which has not been addressed in the reported works. The high-dimensional data visualization approach is used to analyze the effect of speech reconstruction on Mel-frequency cepstral coefficients (MFCC) of speech signals. The Gaussian mixture model supervectors of MFCC are used as acoustic features. Furthermore, a set of commonly used classification algorithms are applied to identify reconstructed speech. According to the comparison among different classification methods, linear discriminant analysis-ensemble classifiers are chosen in our algorithm. Extensive experimental results show that the EERs lower than 1% can be achieved by the proposed algorithm in most cases, outperforming the reported state-of-the-art identification techniques.

References

[1]

Leigh D. Alsteris and Kuldip K. Paliwal. 2007. Short-time phase spectrum in speech processing: A review and some experimental results. Digital Sign. Process. 17, 3 (2007), 578--616.

Abstract

References

Cited By

Index Terms

Recommendations

Importance of Utterance Partitioning in SVM Classifier with GMM Supervectors for Text-Independent Speaker Verification

Multitaper MFCC and PLP features for speaker verification using i-vectors

Improved Speech-Signal Based Frequency Warping Scale for Cepstral Feature in Robust Speaker Verification System

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations