Skip to main content

Identification of Synthetic Spoofed Speech with Deep Capsule Network

  • Conference paper
  • First Online:
Frontiers in Cyber Security (FCS 2021)

Abstract

The state-of-the-art models for speech synthesis and voice conversion have caused a great threat to automatic speech verification (ASV) system. In fact, it is difficult for human beings to perceive the subtle difference between the bonafide speech and spoofed speech from these models. The ASVspoof 2019 challenge, jointly launched by several world-leading research institutions, is the largest and most comprehensive challenge for spoofed speech identification. In this work, a countermeasure system for ASVspoof 2019 is proposed based on cepstrum features and deep capsule network. MFCC and CQCC features are extracted as the input of the proposed network. The convolutional layer and routing strategy of the capsule network are specifically designed to distinguish bonafide speech from spoofed ones. The experimental results on ASVspoof 2019 LA evaluation set show that the proposed deep capsule network can improve the baseline algorithms t-DCF and EER scores by 31% and 37%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Toda, T., et al.: The voice conversion challenge. In: Interspeech, 1632–1636 (2016)

    Google Scholar 

  2. Huang, W.C., Lo, C.C., Hwang, H.T., Tsao, Y., Wang, H.M.: Wavenet vocoder and its applications in voice conversion. In: The 30th ROCLING Conference on Computational Linguistics and Speech Processing (ROCLING) (2018)

    Google Scholar 

  3. Oord, A.V.D., et al.: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)

  4. Wu, Z., Watts, O., King, S.: Merlin: an open source neural network speech synthesis system. In: 9th ISCA Speech Synthesis Workshop, pp. 202–207 (2016)

    Google Scholar 

  5. Juvela, L., et al.: Speech waveform synthesis from MFCC sequences with generative adversarial networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5679–5683 (2018)

    Google Scholar 

  6. Kinnunen, T., et al.: t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. arXiv preprint arXiv:1804.09618 (2018)

  7. Sahidullah, M., Kinnunen, T., Hanilci, C.: A comparison of features for synthetic speech detection. In: The International Speech Communication Association (ISCA) (2015)

    Google Scholar 

  8. Patel, T.B., Patil, H.A.: Combining evidences from Mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  9. Todisco, M., Delgado, H., Evans, N: A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients. Odyssey: The Speaker and Language Recognition Workshop, pp. 283–290 (2016)

    Google Scholar 

  10. Alzantot, M., Wang, Z., Srivastava, M.S.: Deep residual neural networks for audio spoofing detection: deep residual neural networks for audio spoofing detection. In: Interspeech, pp. 1078–1082 (2019)

    Google Scholar 

  11. Lai, C.I., Chen, N., Villalba, J., et al.: ASSERT: anti-spoofing with squeeze-excitation and residual networks: deep residual neural networks for audio spoofing detection. In: Interspeech, pp. 1013–1017 (2019)

    Google Scholar 

  12. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules: advances. In: Neural Information Processing Systems, pp. 3856–3886 (2017)

    Google Scholar 

  13. Tiwari, V.: MFCC and its applications in speaker recognition: Int. J. Emerg. Technol. 1, 19–22 (2010)

    Google Scholar 

  14. Todisco, M., Delgado, H., Evans, N.: Constant Q cepstral coefficients: a spoofing counter-measure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)

    Google Scholar 

  15. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models: IEEE Trans. Speech Audio Process. 3, 72–83 (1995)

    Google Scholar 

  16. Reynolds, D.A., Quatieri, T.F., Dunn, R.S.: Speaker verification using adapted Gaussian mixture models. Digit. Sig. Process. 10, 19–41 (2000)

    Google Scholar 

  17. Jaiswal, A., AbdAlmageed, W., Wu, Y., Natarajan, P.: CapsuleGAN: generative adversarial capsule network. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diqun Yan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mao, T., Yan, D., Gong, Y., Wang, R. (2022). Identification of Synthetic Spoofed Speech with Deep Capsule Network. In: Cao, C., Zhang, Y., Hong, Y., Wang, D. (eds) Frontiers in Cyber Security. FCS 2021. Communications in Computer and Information Science, vol 1558. Springer, Singapore. https://doi.org/10.1007/978-981-19-0523-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-0523-0_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-0522-3

  • Online ISBN: 978-981-19-0523-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics