Several speech synthesis and voice conversion techniques can easily generate or manipulate speech to deceive the speaker verification (SV) systems. Hence, there is a need to develop spoofing countermeasures to detect the human speech from spoofed speech. System-based features have been known to contribute significantly to this task. In this paper, we extend a recent study of Linear Prediction (LP) and Long-Term Prediction (LTP)-based features to LP and Nonlinear Prediction (NLP)-based features. To evaluate the effectiveness of the proposed countermeasure, we use the corpora provided at the ASVspoof 2015 challenge. A Gaussian Mixture Model (GMM)-based classifier is used and the % Equal Error Rate (EER) is used as a performance measure. On the development set, it is found that LP-LTP and LP-NLP features gave an average EER of 4.78% and 9.18%, respectively. Score-level fusion of LP-LTP (and LP-NLP) with Mel Frequency Cepstral Coefficients (MFCC) gave an EER of 0.8% (and 1.37%), respectively. After score-level fusion of LP-LTP, LP-NLP and MFCC features, the EER is significantly reduced to 0.57%. The LP-LTP and LP-NLP features have found to work well even for Blizzard Challenge 2012 speech database.
Cite as: Bhavsar, H.N., Patel, T.B., Patil, H.A. (2016) Novel Nonlinear Prediction Based Features for Spoofed Speech Detection. Proc. Interspeech 2016, 155-159, doi: 10.21437/Interspeech.2016-1002
@inproceedings{bhavsar16_interspeech, author={Himanshu N. Bhavsar and Tanvina B. Patel and Hemant A. Patil}, title={{Novel Nonlinear Prediction Based Features for Spoofed Speech Detection}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={155--159}, doi={10.21437/Interspeech.2016-1002} }