skip to main content
10.1145/3595916.3626406acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Speech Spoofing Detection Based on Graph Attention Networks with Spectral and Temporal Information

Published: 01 January 2024 Publication History

Abstract

Automatic speaker verification (ASV) systems are vulnerable to synthetic speech attacks. Synthetic algorithms usually introduce artifacts in specific sub-bands or time segments. However, under unknown spoofing attacks, it is challenging to choose the right domain for effective detection. In this paper, we propose a speech spoofing detection method based on graph attention networks with spectral and temporal information. First, high-level features of raw audio are extracted using SENet channel attention to enhance the spatial correlation between speech frames. Then, spectral graph and temporal graph are constructed for the high-level features using graph attention networks. Finally, we design a new heterogeneous multi-domain co-graph attention module to process the information from different domains for effective speech spoofing detection. The proposed model was evaluated on the ASVspoof 2019 dataset and obtains a min t-DCF of 0.0264 and an EER of 0.94%, exhibiting competitive performance. Experiments also show its effectiveness when detecting unknown types of attacks.

References

[1]
Sunmook Choi, Il-Youp Kwak, and Seungsang Oh. 2022. Overlapped Frequency-Distributed Network: Frequency-Aware Voice Spoofing Countermeasure. In Proc. Interspeech 2022. ISCA, Incheon, Korea, 3558–3562.
[2]
Hongyang Gao and Shuiwang Ji. 2019. Graph u-nets. In international conference on machine learning. PMLR, California, USA, 2083–2092.
[3]
Wanying Ge, Michele Panariello, Jose Patino, Massimiliano Todisco, and Nicholas Evans. 2021. Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection. In Interspeech 2021. ISCA, Brno, Czechia, 4319–4323.
[4]
Wanying Ge, Jose Patino, Massimiliano Todisco, and Nicholas Evans. 2021. Raw Differentiable Architecture Search for Speech Deepfake and Spoofing Detection. In Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge. ISCA, Online, 22–28.
[5]
John H.L. Hansen and ZHENYU WANG. 2022. Audio Anti-spoofing Using Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning. In Proc. Interspeech 2022. ISCA, Incheon, Korea, 376–380.
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, NV, USA, 770–778.
[7]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Salt Lake City, UT, USA, 7132–7141.
[8]
Guang Hua, Andrew Beng Jin Teoh, and Haijian Zhang. 2021. Towards end-to-end synthetic speech detection. IEEE Signal Processing Letters 28 (2021), 1265–1269.
[9]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, Lille, France, 448–456.
[10]
Ali Javed, Khalid Mahmood Malik, Aun Irtaza, and Hafiz Malik. 2021. Towards protecting cyber-physical and IoT systems from single-and multi-order voice spoofing attacks. Applied Acoustics 183 (2021), 108283.
[11]
Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, and Nicholas Evans. 2022. Aasist: Audio anti-spoofing using integrated spectro-temporal graph attention networks. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Lille, France, 6367–6371.
[12]
Tomi Kinnunen, Kong Aik Lee, Héctor Delgado, Nicholas Evans, Massimiliano Todisco, Md Sahidullah, Junichi Yamagishi, and Douglas A Reynolds. 2018. t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification. In Speaker Odyssey 2018 The Speaker and Language Recognition Workshop. ISCA, Les Sables d’Olonne, France, 312–319.
[13]
Tomi Kinnunen, Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Junichi Yamagishi, and Kong Aik Lee. 2017. The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection. In Proc. Interspeech 2017. ISCA, Stockholm, Sweden, 2–6. https://doi.org/10.21437/Interspeech.2017-1111
[14]
Thomas N Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations. http://OpenReview.net, San Juan, Puerto Rico.
[15]
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. Advances in neural information processing systems 30 (2017).
[16]
Changtao Li, Feiran Yang, and Jun Yang. 2022. The role of long-term dependency in synthetic speech detection. IEEE Signal Processing Letters 29 (2022), 1142–1146.
[17]
Xu Li, Xixin Wu, Hui Lu, Xunying Liu, and Helen Meng. 2021. Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks. In Proc. Interspeech 2021. ISCA, Brno, Czechia, 4314–4318.
[18]
Anwei Luo, Enlei Li, Yongliang Liu, Xiangui Kang, and Z Jane Wang. 2021. A capsule network based approach for detection of audio spoofing attacks. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Toronto,Ontario, Canada, 6359–6363.
[19]
Xinyue Ma, Tianyu Liang, Shanshan Zhang, Shen Huang, and Liang He. 2021. Improved lightcnn with attention modules for asv spoofing detection. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, Shenzhen, China, 1–6.
[20]
Youxuan Ma, Zongze Ren, and Shugong Xu. 2021. RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform. In Proc. Interspeech 2021. ISCA, Brno, Czechia, 4144–4148.
[21]
Khalid Mahmood Malik, Ali Javed, Hafiz Malik, and Aun Irtaza. 2020. A light-weight replay detection framework for voice controlled IoT devices. IEEE Journal of Selected Topics in Signal Processing 14, 5 (2020), 982–996.
[22]
Mirco Ravanelli and Yoshua Bengio. 2018. Speaker recognition from raw waveform with sincnet. In 2018 IEEE spoken language technology workshop (SLT). IEEE, Athens, Greece, 1021–1028.
[23]
Yeqing Ren, Haipeng Peng, Lixiang Li, Xiaopeng Xue, Yang Lan, and Yixian Yang. 2023. Generalized Voice Spoofing Detection via Integral Knowledge Amalgamation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023), 2461–2475.
[24]
Amir Mohammad Rostami, Mohammad Mehdi Homayounpour, and Ahmad Nickabadi. 2023. Efficient attention branch network with combined loss function for automatic speaker verification spoof detection. Circuits, Systems, and Signal Processing 42 (2023), 1–19.
[25]
Md. Sahidullah, Tomi Kinnunen, and Cemal Hanilçi. 2015. A comparison of features for synthetic speech detection. In Proc. Interspeech 2015. ISCA, Dresden, Germany, 2087–2091.
[26]
Berrak Sisman, Junichi Yamagishi, Simon King, and Haizhou Li. 2020. An overview of voice conversion and its challenges: From statistical modeling to deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2020), 132–157.
[27]
Kaavya Sriskandaraja, Vidhyasaharan Sethu, Phu Ngoc Le, and Eliathamby Ambikairajah. 2016. Investigation of sub-band discriminative information between spoofed and genuine speech. In Interspeech. ISCA, San Francisco, USA, 1710–1714.
[28]
Hemlata Tak, Jose Patino, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans, and Anthony Larcher. 2021. End-to-end anti-spoofing with rawnet2. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Toronto, Ontario, Canada, 6369–6373.
[29]
Hemlata Tak, Jee weon Jung, Jose Patino, Madhu Kamble, Massimiliano Todisco, and Nicholas Evans. 2021. End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection. In Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge. ISCA, Online, 1–8.
[30]
Hemlata Tak, Jee weon Jung, Jose Patino, Massimiliano Todisco, and Nicholas Evans. 2021. Graph Attention Networks for Anti-Spoofing. In Proc. Interspeech 2021. ISCA, Brno, Czechia, 2356–2360.
[31]
Massimiliano Todisco, Xin Wang, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi H. Kinnunen, and Kong Aik Lee. 2019. ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. In Proc. Interspeech 2019. ISCA, Graz, Austria, 1008–1012. https://doi.org/10.21437/Interspeech.2019-2249
[32]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=rJXMpikCZ, vancouver, Canada.
[33]
Xin Wang and Junichi Yamagishi. 2021. A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection. Interspeech 2021, (2021), 4259–4263.
[34]
Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. 2018. A light CNN for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security 13, 11 (2018), 2884–2896.
[35]
Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilçi, Md. Sahidullah, and Aleksandr Sizov. 2015. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In Proc. Interspeech 2015. ISCA, Dresden, Germany, 2037–2041. https://doi.org/10.21437/Interspeech.2015-462
[36]
Junxiao Xue, Hao Zhou, Huawei Song, Bin Wu, and Lei Shi. 2023. Cross-modal information fusion for voice spoofing detection. Speech Communication 147 (2023), 41–50.
[37]
Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, and Héctor Delgado. 2021. ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection. In Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge. ISCA, Online, 47–54.
[38]
Jichen Yang, Rohan Kumar Das, and Haizhou Li. 2019. Significance of subband features for synthetic speech detection. IEEE Transactions on Information Forensics and Security 15 (2019), 2160–2170.
[39]
You Zhang, Fei Jiang, and Zhiyao Duan. 2021. One-class learning towards synthetic voice spoofing detection. IEEE Signal Processing Letters 28 (2021), 937–941.

Cited By

View all
  • (2024)Integrating Self-Supervised Pre-Training With Adversarial Learning for Synthesized Song Detection2024 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT61566.2024.10832322(795-802)Online publication date: 2-Dec-2024
  • (2024)Spoofing Countermeasure for Fake Speech Detection Using Brute Force FeaturesComputer Speech & Language10.1016/j.csl.2024.101732(101732)Online publication date: Oct-2024

Index Terms

  1. Speech Spoofing Detection Based on Graph Attention Networks with Spectral and Temporal Information

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
      December 2023
      745 pages
      ISBN:9798400702051
      DOI:10.1145/3595916
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 January 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. anti-spoofing
      2. attention mechanism.
      3. graph attention networks
      4. speech spoofing detection

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      MMAsia '23
      Sponsor:
      MMAsia '23: ACM Multimedia Asia
      December 6 - 8, 2023
      Tainan, Taiwan

      Acceptance Rates

      Overall Acceptance Rate 59 of 204 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)80
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Integrating Self-Supervised Pre-Training With Adversarial Learning for Synthesized Song Detection2024 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT61566.2024.10832322(795-802)Online publication date: 2-Dec-2024
      • (2024)Spoofing Countermeasure for Fake Speech Detection Using Brute Force FeaturesComputer Speech & Language10.1016/j.csl.2024.101732(101732)Online publication date: Oct-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media