Fake Speech Detection Using OpenSMILE Features

Kumar, Devesh; Patil, Pavan Kumar V.; Agarwal, Ayush; Prasanna, S. R. Mahadeva

doi:10.1007/978-3-031-20980-2_35

Devesh Kumar¹¹,
Pavan Kumar V. Patil¹¹,
Ayush Agarwal¹¹ &
…
S. R. Mahadeva Prasanna¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

International Conference on Speech and Computer

869 Accesses
3 Citations

Abstract

With the advancement of technology in deep learning, we have developed methods that generate fake speech, which is impossible to differentiate from a natural speech by an ordinary person perceptually. Fake speech can be used maliciously to harm society or a person (impersonation, fake news spreading, etc.), so we need to develop methods to detect fake speech. Several features have been proposed in the literature that can identify fake speech. Each of those features has different contributions to the detection task. In this work, we propose to use the most common speech features together for fake speech detection. openSMILE toolkit is an open-source library that extracts the most common speech features and stores them in the vector of 88 dimensions. We use these features over machine learning models to detect fake speech. To check the robustness of the proposed method, we test it over various datasets that contain session, gender, domain, and synthesizer variability. The experimental results on the different variabilities showed that the openSMILE features were able to detect the fake speech in session, gender and synthesizer variability with high performance, whereas the performance is low with the domain variability conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Agarwal, A., Swain, A., Mishra, J., Prasanna, S.M.: Significance of prosody modification in privacy preservation on speaker verification. In: 2022 National Conference on Communications (NCC), pp. 245–249. IEEE (2022)
Google Scholar
Agarwal, A., Swain, A., Prasanna, S.M.: Speaker anonymization for machines using sinusoidal model. In: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM), pp. 1–5. IEEE (2022)
Google Scholar
Baby, D., Devaraj, S.J., Hemanth, J., et al.: Leukocyte classification based on feature selection using extra trees classifier: atransfer learning approach. Turkish J. Electr. Eng. Comput. Sci. 29(8), 2742–2757 (2021)
Article Google Scholar
Bhangale, K.B., Titare, P., Pawar, R., Bhavsar, S.: Synthetic speech spoofing detection using MFCC and radial basis function SVM. IOSR J. Eng. (IOSRJEN) 8(6), 55–62 (2018)
Google Scholar
Cooper, E., et al.: Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6184–6188. IEEE (2020)
Google Scholar
Eyben, F., et al.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016). https://doi.org/10.1109/TAFFC.2015.2457417
Article Google Scholar
Ito, K., Johnson, L.: The lj speech dataset. https://keithito.com/LJ-Speech-Dataset/ (2017)
Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)
Google Scholar
Kominek, J., Black, A.W., Ver, V.: CMU arctic databases for speech synthesis. Technical Report (2003)
Google Scholar
Kong, J., Kim, J., Bae, J.: HiFi-GAN: generative adversarial networks for efficient and high fidelity speech synthesis. Adv. Neural Inf. Process. Syst. 33, 17022–17033 (2020)
Google Scholar
Kumar, K., et al.: Melgan: Generative adversarial networks for conditional waveform synthesis. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Mishra, J., Singh, M., Pati, D.: Processing linear prediction residual signal to counter replay attacks. In: 2018 International Conference on Signal Processing and Communications (SPCOM), pp. 95–99. IEEE (2018)
Google Scholar
Pal, M., Paul, D., Saha, G.: Synthetic speech detection using fundamental frequency variation and spectral features. Comput. Speech Lang. 48, 31–50 (2018)
Article Google Scholar
Patel, T.B., Patil, H.A.: Effectiveness of fundamental frequency (f 0) and strength of excitation (SOE) for spoofed speech detection. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5105–5109. IEEE (2016)
Google Scholar
Qian, K., Zhang, Y., Chang, S., Yang, X., Hasegawa-Johnson, M.: AutoVC: zero-shot voice style transfer with only autoencoder loss. In: International Conference on Machine Learning, pp. 5210–5219. PMLR (2019)
Google Scholar
Singh, A.K., Singh, P.: Detection of AI-synthesized speech using cepstral & bispectral statistics. In: 2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 412–417. IEEE (2021)
Google Scholar
Todisco, M., Delgado, H., Evans, N.: Constant q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)
Article Google Scholar
Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)
Woubie, A., Bäckström, T.: Voice-quality features for replay attack detection (2022)
Google Scholar
Wu, Z., et al.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Wu, Z., Xiao, X., Chng, E.S., Li, H.: Synthetic speech detection using temporal modulation feature. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7234–7238. IEEE (2013)
Google Scholar
Yamamoto, R., Song, E., Kim, J.M.: Parallel WaveGAN: a fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6199–6203. IEEE (2020)
Google Scholar
Zen, H., et al.: Libritts: a corpus derived from librispeech for text-to-speech. In: Interspeech (2019). https://arxiv.org/abs/1904.02882

Download references

Acknowledgments

This work is funded by Ministry of Electronics and Information Technology (MeitY), Govt. of India under the project title “Fake Speech detection using Deep Learning Framework”.

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, 580011, India
Devesh Kumar, Pavan Kumar V. Patil, Ayush Agarwal & S. R. Mahadeva Prasanna

Authors

Devesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Pavan Kumar V. Patil
View author publications
You can also search for this author in PubMed Google Scholar
Ayush Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayush Agarwal .

Editor information

Editors and Affiliations

Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna
St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, D., Patil, P.K.V., Agarwal, A., Prasanna, S.R.M. (2022). Fake Speech Detection Using OpenSMILE Features. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-20980-2_35
Published: 10 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics