Abstract:
A device feature, which contains information from both the recording channel and playback channel, is of paramount importance in replay spoofing detection. Presently, no ...Show MoreMetadata
Abstract:
A device feature, which contains information from both the recording channel and playback channel, is of paramount importance in replay spoofing detection. Presently, no technical reports, pertaining to the usage of device information for spoofing detection to achieve speaker verification, have been published. In this article, we propose to build a replay device feature (RDF) extractor on the basis of the genuine-replay-pair parallel neural network training database. The target of the neural network for such an RDF extractor is constrained in a spectrum domain such as discrete Fourier transform or constant-Q transform. Different types of neural networks are investigated for the parallel training framework. Finally the RDF extractor is formed by applying discrete cosine transform to the output vector of neural network. The bidirectional long short term memory gives the best performance among the investigated types of neural networks. The experimental results on ASVspoof 2017 corpus version 2.0 indicate that the equal error rate (EER) of a replay detection system with the proposed RDF has a value of 15.08%. Furthermore, by combining the RDF with constant-Q cepstral coefficients plus the log energy in score level, the EER of the detection system can be further reduced to 8.99%. In addition, the experimental results also reveal that the RDF exhibits great complementarity with many handcrafted features.
Published in: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 28)