The issue of the spoofing attacks which may affect automatic speaker verification systems (ASVs) has recently received an increased attention, so that a number of countermeasures have been developed for detecting high technology attacks such as speech synthesis and voice conversion. However, the performance of anti-spoofing systems degrades significantly in noisy conditions. To address this issue, we propose a deep learning framework to extract spoofing identity vectors, as well as the use of soft missing-data masks. The proposed feature extraction employs a convolutional neural network (CNN) plus a recurrent neural network (RNN) in order to provide a single deep feature vector per utterance. Thus, the CNN is treated as a convolutional feature extractor that operates at the frame level. On top of the CNN outputs, the RNN is employed to obtain a single spoofing identity representation of the whole utterance. Experimental evaluation is carried out on both a clean and a noisy version of the ASVSpoof2015 corpus. The experimental results show that our proposals clearly outperforms other methods recently proposed such as the popular CQCC+GMM system or other similar deep feature systems for both seen and unseen noisy conditions.
Cite as: Gómez Alanís, A., Peinado, A.M., Gonzalez, J.A., Gomez, A. (2018) A Deep Identity Representation for Noise Robust Spoofing Detection. Proc. Interspeech 2018, 676-680, doi: 10.21437/Interspeech.2018-1909
@inproceedings{gomezalanis18_interspeech, author={Alejandro {Gómez Alanís} and Antonio M. Peinado and Jose A. Gonzalez and Angel Gomez}, title={{A Deep Identity Representation for Noise Robust Spoofing Detection}}, year=2018, booktitle={Proc. Interspeech 2018}, pages={676--680}, doi={10.21437/Interspeech.2018-1909} }