research-article

Speech Spoofing Detection Based on Graph Attention Networks with Spectral and Temporal Information

Authors:

Jianqiang Zhang,

Xiaoming WuAuthors Info & Claims

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Article No.: 34, Pages 1 - 7

https://doi.org/10.1145/3595916.3626406

Published: 01 January 2024 Publication History

Abstract

Automatic speaker verification (ASV) systems are vulnerable to synthetic speech attacks. Synthetic algorithms usually introduce artifacts in specific sub-bands or time segments. However, under unknown spoofing attacks, it is challenging to choose the right domain for effective detection. In this paper, we propose a speech spoofing detection method based on graph attention networks with spectral and temporal information. First, high-level features of raw audio are extracted using SENet channel attention to enhance the spatial correlation between speech frames. Then, spectral graph and temporal graph are constructed for the high-level features using graph attention networks. Finally, we design a new heterogeneous multi-domain co-graph attention module to process the information from different domains for effective speech spoofing detection. The proposed model was evaluated on the ASVspoof 2019 dataset and obtains a min t-DCF of 0.0264 and an EER of 0.94%, exhibiting competitive performance. Experiments also show its effectiveness when detecting unknown types of attacks.

References

[1]

Sunmook Choi, Il-Youp Kwak, and Seungsang Oh. 2022. Overlapped Frequency-Distributed Network: Frequency-Aware Voice Spoofing Countermeasure. In Proc. Interspeech 2022. ISCA, Incheon, Korea, 3558–3562.

[2]

Hongyang Gao and Shuiwang Ji. 2019. Graph u-nets. In international conference on machine learning. PMLR, California, USA, 2083–2092.

[3]

Wanying Ge, Michele Panariello, Jose Patino, Massimiliano Todisco, and Nicholas Evans. 2021. Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection. In Interspeech 2021. ISCA, Brno, Czechia, 4319–4323.

[4]

Wanying Ge, Jose Patino, Massimiliano Todisco, and Nicholas Evans. 2021. Raw Differentiable Architecture Search for Speech Deepfake and Spoofing Detection. In Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge. ISCA, Online, 22–28.

[5]

John H.L. Hansen and ZHENYU WANG. 2022. Audio Anti-spoofing Using Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning. In Proc. Interspeech 2022. ISCA, Incheon, Korea, 376–380.

[6]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, NV, USA, 770–778.

[7]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Salt Lake City, UT, USA, 7132–7141.

[8]

Guang Hua, Andrew Beng Jin Teoh, and Haijian Zhang. 2021. Towards end-to-end synthetic speech detection. IEEE Signal Processing Letters 28 (2021), 1265–1269.

[9]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, Lille, France, 448–456.

[10]

Ali Javed, Khalid Mahmood Malik, Aun Irtaza, and Hafiz Malik. 2021. Towards protecting cyber-physical and IoT systems from single-and multi-order voice spoofing attacks. Applied Acoustics 183 (2021), 108283.

[11]

Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, and Nicholas Evans. 2022. Aasist: Audio anti-spoofing using integrated spectro-temporal graph attention networks. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Lille, France, 6367–6371.

[12]

Tomi Kinnunen, Kong Aik Lee, Héctor Delgado, Nicholas Evans, Massimiliano Todisco, Md Sahidullah, Junichi Yamagishi, and Douglas A Reynolds. 2018. t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification. In Speaker Odyssey 2018 The Speaker and Language Recognition Workshop. ISCA, Les Sables d’Olonne, France, 312–319.

[13]

Tomi Kinnunen, Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Junichi Yamagishi, and Kong Aik Lee. 2017. The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection. In Proc. Interspeech 2017. ISCA, Stockholm, Sweden, 2–6. https://doi.org/10.21437/Interspeech.2017-1111

[14]

Thomas N Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations. http://OpenReview.net, San Juan, Puerto Rico.

[15]

Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. Advances in neural information processing systems 30 (2017).

[16]

Changtao Li, Feiran Yang, and Jun Yang. 2022. The role of long-term dependency in synthetic speech detection. IEEE Signal Processing Letters 29 (2022), 1142–1146.

[17]

Xu Li, Xixin Wu, Hui Lu, Xunying Liu, and Helen Meng. 2021. Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks. In Proc. Interspeech 2021. ISCA, Brno, Czechia, 4314–4318.

[18]

Anwei Luo, Enlei Li, Yongliang Liu, Xiangui Kang, and Z Jane Wang. 2021. A capsule network based approach for detection of audio spoofing attacks. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Toronto,Ontario, Canada, 6359–6363.

[19]

Xinyue Ma, Tianyu Liang, Shanshan Zhang, Shen Huang, and Liang He. 2021. Improved lightcnn with attention modules for asv spoofing detection. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, Shenzhen, China, 1–6.

[20]

Youxuan Ma, Zongze Ren, and Shugong Xu. 2021. RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform. In Proc. Interspeech 2021. ISCA, Brno, Czechia, 4144–4148.

[21]

Khalid Mahmood Malik, Ali Javed, Hafiz Malik, and Aun Irtaza. 2020. A light-weight replay detection framework for voice controlled IoT devices. IEEE Journal of Selected Topics in Signal Processing 14, 5 (2020), 982–996.

[22]

Mirco Ravanelli and Yoshua Bengio. 2018. Speaker recognition from raw waveform with sincnet. In 2018 IEEE spoken language technology workshop (SLT). IEEE, Athens, Greece, 1021–1028.

[23]

Yeqing Ren, Haipeng Peng, Lixiang Li, Xiaopeng Xue, Yang Lan, and Yixian Yang. 2023. Generalized Voice Spoofing Detection via Integral Knowledge Amalgamation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023), 2461–2475.

Digital Library

[24]

Amir Mohammad Rostami, Mohammad Mehdi Homayounpour, and Ahmad Nickabadi. 2023. Efficient attention branch network with combined loss function for automatic speaker verification spoof detection. Circuits, Systems, and Signal Processing 42 (2023), 1–19.

Digital Library

[25]

Md. Sahidullah, Tomi Kinnunen, and Cemal Hanilçi. 2015. A comparison of features for synthetic speech detection. In Proc. Interspeech 2015. ISCA, Dresden, Germany, 2087–2091.

[26]

Berrak Sisman, Junichi Yamagishi, Simon King, and Haizhou Li. 2020. An overview of voice conversion and its challenges: From statistical modeling to deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2020), 132–157.

Digital Library

[27]

Kaavya Sriskandaraja, Vidhyasaharan Sethu, Phu Ngoc Le, and Eliathamby Ambikairajah. 2016. Investigation of sub-band discriminative information between spoofed and genuine speech. In Interspeech. ISCA, San Francisco, USA, 1710–1714.

[28]

Hemlata Tak, Jose Patino, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans, and Anthony Larcher. 2021. End-to-end anti-spoofing with rawnet2. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Toronto, Ontario, Canada, 6369–6373.

[29]

Hemlata Tak, Jee weon Jung, Jose Patino, Madhu Kamble, Massimiliano Todisco, and Nicholas Evans. 2021. End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection. In Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge. ISCA, Online, 1–8.

[30]

Hemlata Tak, Jee weon Jung, Jose Patino, Massimiliano Todisco, and Nicholas Evans. 2021. Graph Attention Networks for Anti-Spoofing. In Proc. Interspeech 2021. ISCA, Brno, Czechia, 2356–2360.

[31]

Massimiliano Todisco, Xin Wang, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi H. Kinnunen, and Kong Aik Lee. 2019. ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. In Proc. Interspeech 2019. ISCA, Graz, Austria, 1008–1012. https://doi.org/10.21437/Interspeech.2019-2249

[32]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=rJXMpikCZ, vancouver, Canada.

[33]

Xin Wang and Junichi Yamagishi. 2021. A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection. Interspeech 2021, (2021), 4259–4263.

[34]

Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. 2018. A light CNN for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security 13, 11 (2018), 2884–2896.

[35]

Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilçi, Md. Sahidullah, and Aleksandr Sizov. 2015. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In Proc. Interspeech 2015. ISCA, Dresden, Germany, 2037–2041. https://doi.org/10.21437/Interspeech.2015-462

[36]

Junxiao Xue, Hao Zhou, Huawei Song, Bin Wu, and Lei Shi. 2023. Cross-modal information fusion for voice spoofing detection. Speech Communication 147 (2023), 41–50.

Digital Library

[37]

Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, and Héctor Delgado. 2021. ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection. In Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge. ISCA, Online, 47–54.

[38]

Jichen Yang, Rohan Kumar Das, and Haizhou Li. 2019. Significance of subband features for synthetic speech detection. IEEE Transactions on Information Forensics and Security 15 (2019), 2160–2170.

[39]

You Zhang, Fei Jiang, and Zhiyao Duan. 2021. One-class learning towards synthetic voice spoofing detection. IEEE Signal Processing Letters 28 (2021), 937–941.

Cited By

Wang YDu YZhang DZheng RDeng J(2024)Integrating Self-Supervised Pre-Training With Adversarial Learning for Synthesized Song Detection2024 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT61566.2024.10832322(795-802)Online publication date: 2-Dec-2024
https://doi.org/10.1109/SLT61566.2024.10832322
Mirza AAl-Talabani A(2024)Spoofing Countermeasure for Fake Speech Detection Using Brute Force FeaturesComputer Speech & Language10.1016/j.csl.2024.101732(101732)Online publication date: Oct-2024
https://doi.org/10.1016/j.csl.2024.101732

Index Terms

Speech Spoofing Detection Based on Graph Attention Networks with Spectral and Temporal Information
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
2. Security and privacy
  1. Security services
    1. Authentication

Recommendations

Data selection for i-vector based automatic speaker verification anti-spoofing

State-of-the-art i-vector based automatic speaker verification (ASV) systems lead to considerably high performance and thus voice becomes one of the most important biometric modality for person authentication. However, similar to other biometrics, ASV ...
Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance

In this paper, we present a systematic study of the vulnerability of automatic speaker verification to a diverse range of spoofing attacks. We start with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion ...
Synthetic speech detection using fundamental frequency variation and spectral features

Proposed synthetic speech detection using score fusion of CQCC, APGDF and fundamental frequency variation (FFV) features.Best spoofing detection performance on the ASVspoof 2015 evaluation dataset with an overall EER of 0.05%.Produced the state-of-the-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

December 2023

745 pages

ISBN:9798400702051

DOI:10.1145/3595916

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Major Innovation Projects of the Pilot Project of Science, Education and Industry Integration
National Natural Science Foundation of China
National Key R&D Program of China

Conference

MMAsia '23

Sponsor:

SIGMM

MMAsia '23: ACM Multimedia Asia

December 6 - 8, 2023

Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
127
Total Downloads

Downloads (Last 12 months)80
Downloads (Last 6 weeks)6

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YDu YZhang DZheng RDeng J(2024)Integrating Self-Supervised Pre-Training With Adversarial Learning for Synthesized Song Detection2024 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT61566.2024.10832322(795-802)Online publication date: 2-Dec-2024
https://doi.org/10.1109/SLT61566.2024.10832322
Mirza AAl-Talabani A(2024)Spoofing Countermeasure for Fake Speech Detection Using Brute Force FeaturesComputer Speech & Language10.1016/j.csl.2024.101732(101732)Online publication date: Oct-2024
https://doi.org/10.1016/j.csl.2024.101732

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

Affiliations

Peng Zhang

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), CN and Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, China

https://orcid.org/0000-0002-4851-2094

Yida Chen

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), CN and Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, China

https://orcid.org/0009-0006-9248-7304

Meijuan Li

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), CN and Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, China

https://orcid.org/0009-0003-5614-7044

Hui Zhao

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), CN and Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, China

https://orcid.org/0009-0005-1150-943X

Jianqiang Zhang

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), CN and Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, China

https://orcid.org/0009-0007-4670-2197

Fuqiang Wang

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), CN and Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, China

https://orcid.org/0000-0003-2843-0136

Xiaoming Wu

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), CN and Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, China

https://orcid.org/0009-0004-3160-0620

View Table of Conten