Loading [MathJax]/extensions/TeX/ieee_stixext.js
Spoofing Speech Detection Using Modified Relative Phase Information | IEEE Journals & Magazine | IEEE Xplore

Spoofing Speech Detection Using Modified Relative Phase Information


Abstract:

The detection of human and spoofing (synthetic or converted) speech has started to receive an increasing amount of attention. In this paper, modified relative phase (MRP)...Show More

Abstract:

The detection of human and spoofing (synthetic or converted) speech has started to receive an increasing amount of attention. In this paper, modified relative phase (MRP) information extracted from a Fourier spectrum is proposed for spoofing speech detection. Because original phase information is almost entirely lost in spoofing speech using current synthesis or conversion techniques, some phase information extraction methods, such as the modified group delay feature and cosine phase feature, have been shown to be effective for detecting human speech and spoofing speech. However, existing phase information-based features cannot obtain very high spoofing speech detection performance because they cannot extract precise phase information from speech. Relative phase (RP) information, which extracts phase information precisely, has been shown to be effective for speaker recognition. In this paper, RP information is applied to spoofing speech detection, and it is expected to achieve better spoofing detection performance. Furthermore, two modified processing techniques of the original RP, that is, pseudo pitch synchronization and linear discriminant analysis based full-band RP extraction, are proposed in this paper. In this study, MRP information is also combined with the Mel-frequency cepstral coefficient (MFCC) and modified group delay. The proposed method was evaluated using the ASVspoof 2015: Automatic Speaker Verification Spoofing and Countermeasures Challenge dataset. The results show that the proposed MRP information significantly outperforms the MFCC, modified group delay, and other phase information based features. For the development dataset, the equal error rate (EER) was reduced from 1.883% of the MFCC, 0.567% of the modified group delay to 0.013% of the MRP. By combining the RP with the MFCC and modified group delay, the EER was reduced to 0.003%. For the evaluation dataset, the MRP obtained much better performance than the magnitude-based feature and other ph...
Published in: IEEE Journal of Selected Topics in Signal Processing ( Volume: 11, Issue: 4, June 2017)
Page(s): 660 - 670
Date of Publication: 13 April 2017

ISSN Information:

Funding Agency:


References

References is not available for this document.