Skip to main content

Fake Speech Detection Using OpenSMILE Features

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

Abstract

With the advancement of technology in deep learning, we have developed methods that generate fake speech, which is impossible to differentiate from a natural speech by an ordinary person perceptually. Fake speech can be used maliciously to harm society or a person (impersonation, fake news spreading, etc.), so we need to develop methods to detect fake speech. Several features have been proposed in the literature that can identify fake speech. Each of those features has different contributions to the detection task. In this work, we propose to use the most common speech features together for fake speech detection. openSMILE toolkit is an open-source library that extracts the most common speech features and stores them in the vector of 88 dimensions. We use these features over machine learning models to detect fake speech. To check the robustness of the proposed method, we test it over various datasets that contain session, gender, domain, and synthesizer variability. The experimental results on the different variabilities showed that the openSMILE features were able to detect the fake speech in session, gender and synthesizer variability with high performance, whereas the performance is low with the domain variability conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://docs.google.com/spreadsheets/d/16_7RUgXKOty4s4MkD_db0EKmvqtqoTUg/edit#gid=1002269497.

  2. 2.

    https://drive.google.com/drive/folders/1sd_QzcUNnbiaWq7L0ykMP7Xmk-zOuxTi?usp=sharing.

References

  1. Agarwal, A., Swain, A., Mishra, J., Prasanna, S.M.: Significance of prosody modification in privacy preservation on speaker verification. In: 2022 National Conference on Communications (NCC), pp. 245–249. IEEE (2022)

    Google Scholar 

  2. Agarwal, A., Swain, A., Prasanna, S.M.: Speaker anonymization for machines using sinusoidal model. In: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM), pp. 1–5. IEEE (2022)

    Google Scholar 

  3. Baby, D., Devaraj, S.J., Hemanth, J., et al.: Leukocyte classification based on feature selection using extra trees classifier: atransfer learning approach. Turkish J. Electr. Eng. Comput. Sci. 29(8), 2742–2757 (2021)

    Article  Google Scholar 

  4. Bhangale, K.B., Titare, P., Pawar, R., Bhavsar, S.: Synthetic speech spoofing detection using MFCC and radial basis function SVM. IOSR J. Eng. (IOSRJEN) 8(6), 55–62 (2018)

    Google Scholar 

  5. Cooper, E., et al.: Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6184–6188. IEEE (2020)

    Google Scholar 

  6. Eyben, F., et al.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016). https://doi.org/10.1109/TAFFC.2015.2457417

    Article  Google Scholar 

  7. Ito, K., Johnson, L.: The lj speech dataset. https://keithito.com/LJ-Speech-Dataset/ (2017)

  8. Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)

    Google Scholar 

  9. Kominek, J., Black, A.W., Ver, V.: CMU arctic databases for speech synthesis. Technical Report (2003)

    Google Scholar 

  10. Kong, J., Kim, J., Bae, J.: HiFi-GAN: generative adversarial networks for efficient and high fidelity speech synthesis. Adv. Neural Inf. Process. Syst. 33, 17022–17033 (2020)

    Google Scholar 

  11. Kumar, K., et al.: Melgan: Generative adversarial networks for conditional waveform synthesis. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  12. Mishra, J., Singh, M., Pati, D.: Processing linear prediction residual signal to counter replay attacks. In: 2018 International Conference on Signal Processing and Communications (SPCOM), pp. 95–99. IEEE (2018)

    Google Scholar 

  13. Pal, M., Paul, D., Saha, G.: Synthetic speech detection using fundamental frequency variation and spectral features. Comput. Speech Lang. 48, 31–50 (2018)

    Article  Google Scholar 

  14. Patel, T.B., Patil, H.A.: Effectiveness of fundamental frequency (f 0) and strength of excitation (SOE) for spoofed speech detection. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5105–5109. IEEE (2016)

    Google Scholar 

  15. Qian, K., Zhang, Y., Chang, S., Yang, X., Hasegawa-Johnson, M.: AutoVC: zero-shot voice style transfer with only autoencoder loss. In: International Conference on Machine Learning, pp. 5210–5219. PMLR (2019)

    Google Scholar 

  16. Singh, A.K., Singh, P.: Detection of AI-synthesized speech using cepstral & bispectral statistics. In: 2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 412–417. IEEE (2021)

    Google Scholar 

  17. Todisco, M., Delgado, H., Evans, N.: Constant q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)

    Article  Google Scholar 

  18. Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)

  19. Woubie, A., Bäckström, T.: Voice-quality features for replay attack detection (2022)

    Google Scholar 

  20. Wu, Z., et al.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  21. Wu, Z., Xiao, X., Chng, E.S., Li, H.: Synthetic speech detection using temporal modulation feature. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7234–7238. IEEE (2013)

    Google Scholar 

  22. Yamamoto, R., Song, E., Kim, J.M.: Parallel WaveGAN: a fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6199–6203. IEEE (2020)

    Google Scholar 

  23. Zen, H., et al.: Libritts: a corpus derived from librispeech for text-to-speech. In: Interspeech (2019). https://arxiv.org/abs/1904.02882

Download references

Acknowledgments

This work is funded by Ministry of Electronics and Information Technology (MeitY), Govt. of India under the project title “Fake Speech detection using Deep Learning Framework”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayush Agarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, D., Patil, P.K.V., Agarwal, A., Prasanna, S.R.M. (2022). Fake Speech Detection Using OpenSMILE Features. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20980-2_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20979-6

  • Online ISBN: 978-3-031-20980-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics