Skip to main content

Fake Speech Detection Using Modulation Spectrogram

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

Abstract

Nowadays, speech technology like automatic speaker verification (ASV) systems can accurately verify the speaker’s identity, and hence they are extensively used in biometrics and banks. With the advancements in deep learning, deepFake has become the primary threat to these ASV systems. The researchers keep proposing methods to generate speech with characteristics indistinguishable from the original speech. Various techniques exist that perform fake speech detection, but these methods are oriented toward a specific dataset or the source of the generation of fake speech. In this work, we propose a modulation spectrogram-based fake speech detection. We show the ability of the modulation spectrogram to classify when there is speaker, session, gender, domain, and source of generation variation. The proposed approach is evaluated on CMU-arctic, LJ Speech, and LibreTTS datasets, and classification accuracy is reported. The accuracy score shows that the proposed approach can classify fake speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Fake Speech Dataset.

References

  1. Agarwal, A., Swain, A., Prasanna, S.R.M.: Speaker anonymization for machines using sinusoidal model. In: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM). IEEE (2022)

    Google Scholar 

  2. Agarwal, A., et al.: Significance of prosody modification in privacy preservation on speaker verification. In: 2022 National Conference on Communications (NCC). IEEE (2022)

    Google Scholar 

  3. Black, A.W.: CMU wilderness multilingual speech dataset. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2019)

    Google Scholar 

  4. Balamurali, B.T., et al.: Toward robust audio spoofing detection: a detailed comparison of traditional and learned features. IEEE Access 7, 84229–84241 (2019)

    Article  Google Scholar 

  5. Gao, Y., et al.: Detection and evaluation of human and machine generated speech in spoofing attacks on automatic speaker verification systems. In: 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE (2021)

    Google Scholar 

  6. Greenberg, S., Kingsbury, B.E.D.: The modulation spectrogram: in pursuit of an invariant representation of speech. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3. IEEE (1997)

    Google Scholar 

  7. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  8. Hore, A., Ziou, D.: Image quality metrics: PSNR vs. SSIM. In: 2010 20th International Conference on Pattern Recognition. IEEE (2010)

    Google Scholar 

  9. Jung, J., et al.: SASV challenge 2022: a spoofing aware speaker verification challenge evaluation plan. arXiv preprint arXiv:2201.10283 (2022)

  10. Kawahara, H.: Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. IEEE (1997)

    Google Scholar 

  11. Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)

    Google Scholar 

  12. Kong, J., Kim, J., Bae, J.: HIFI-GAN: generative adversarial networks for efficient and high fidelity speech synthesis. Adv. Neural. Inf. Process. Syst. 33, 17022–17033 (2020)

    Google Scholar 

  13. Kumar, K., et al.: MelGAN: generative adversarial networks for conditional waveform synthesis. In: Advances in Neural Information Processing Systems,vol. 32 (2019)

    Google Scholar 

  14. Ito, K.: The LJ speech dataset (2017). https://keithito.com/LJ-Speech-Dataset/

  15. Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)

  16. Wu, Z., et al.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  17. Wu, Z., et al.: Synthetic speech detection using temporal modulation feature. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)

    Google Scholar 

  18. Yamamoto, R., Song, E., Kim, J.-M.: Parallel WaveGAN: a fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2020)

    Google Scholar 

  19. Yang, G., et al.: Multi-band MelGAN: faster waveform generation for high-quality text-to-speech. In: 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE (2021)

    Google Scholar 

  20. Zen, H., et al.: LibriTTS: a corpus derived from LibriSpeech for text-to-speech. arXiv preprint arXiv:1904.02882 (2019)

Download references

Acknowledgments

This work is funded by Ministry of Electronics and Information Technology (MeitY), Govt. of India under the project title “Fake Speech detection using Deep Learning Framework”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayush Agarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Magazine, R., Agarwal, A., Hedge, A., Prasanna, S.R.M. (2022). Fake Speech Detection Using Modulation Spectrogram. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20980-2_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20979-6

  • Online ISBN: 978-3-031-20980-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics