Abstract:
A great deal of recent research reveals that artifacts introduced by spoofing algorithms reside in specific frequency subbands or temporal segments. Therefore, the perfor...Show MoreMetadata
Abstract:
A great deal of recent research reveals that artifacts introduced by spoofing algorithms reside in specific frequency subbands or temporal segments. Therefore, the performance of spoofing detection can be improved by focusing on these regions. However, it is difficult for the detection system to choose an appropriate region when it encounters an unknown spoofing algorithm, resulting in poor generalization. Actually, there is a noticeable difference in the inter-region relationship between the bonafide and spoofed speeches. We name the inter-region relationship spectro-temporal dependency and design a method to model it for anti-spoofing. By focusing on the general dependency difference rather than specific regions, the generalization ability of the detection system can be improved. We employ a graph neural network to model the dependency and incorporate prior knowledge into the graph by designing the graph structure and edge weight, which forces the network to pay more attention to potential relationships. In addition, an attention mechanism is introduced in the graph pooling to focus on more critical nodes. The proposed method achieves an equal error rate of 0.58% on the ASVspoof 2019 LA dataset and outperforms all competing systems.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information: