Abstract
With the advancement of technology in deep learning, we have developed methods that generate fake speech, which is impossible to differentiate from a natural speech by an ordinary person perceptually. Fake speech can be used maliciously to harm society or a person (impersonation, fake news spreading, etc.), so we need to develop methods to detect fake speech. Several features have been proposed in the literature that can identify fake speech. Each of those features has different contributions to the detection task. In this work, we propose to use the most common speech features together for fake speech detection. openSMILE toolkit is an open-source library that extracts the most common speech features and stores them in the vector of 88 dimensions. We use these features over machine learning models to detect fake speech. To check the robustness of the proposed method, we test it over various datasets that contain session, gender, domain, and synthesizer variability. The experimental results on the different variabilities showed that the openSMILE features were able to detect the fake speech in session, gender and synthesizer variability with high performance, whereas the performance is low with the domain variability conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, A., Swain, A., Mishra, J., Prasanna, S.M.: Significance of prosody modification in privacy preservation on speaker verification. In: 2022 National Conference on Communications (NCC), pp. 245–249. IEEE (2022)
Agarwal, A., Swain, A., Prasanna, S.M.: Speaker anonymization for machines using sinusoidal model. In: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM), pp. 1–5. IEEE (2022)
Baby, D., Devaraj, S.J., Hemanth, J., et al.: Leukocyte classification based on feature selection using extra trees classifier: atransfer learning approach. Turkish J. Electr. Eng. Comput. Sci. 29(8), 2742–2757 (2021)
Bhangale, K.B., Titare, P., Pawar, R., Bhavsar, S.: Synthetic speech spoofing detection using MFCC and radial basis function SVM. IOSR J. Eng. (IOSRJEN) 8(6), 55–62 (2018)
Cooper, E., et al.: Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6184–6188. IEEE (2020)
Eyben, F., et al.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016). https://doi.org/10.1109/TAFFC.2015.2457417
Ito, K., Johnson, L.: The lj speech dataset. https://keithito.com/LJ-Speech-Dataset/ (2017)
Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)
Kominek, J., Black, A.W., Ver, V.: CMU arctic databases for speech synthesis. Technical Report (2003)
Kong, J., Kim, J., Bae, J.: HiFi-GAN: generative adversarial networks for efficient and high fidelity speech synthesis. Adv. Neural Inf. Process. Syst. 33, 17022–17033 (2020)
Kumar, K., et al.: Melgan: Generative adversarial networks for conditional waveform synthesis. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Mishra, J., Singh, M., Pati, D.: Processing linear prediction residual signal to counter replay attacks. In: 2018 International Conference on Signal Processing and Communications (SPCOM), pp. 95–99. IEEE (2018)
Pal, M., Paul, D., Saha, G.: Synthetic speech detection using fundamental frequency variation and spectral features. Comput. Speech Lang. 48, 31–50 (2018)
Patel, T.B., Patil, H.A.: Effectiveness of fundamental frequency (f 0) and strength of excitation (SOE) for spoofed speech detection. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5105–5109. IEEE (2016)
Qian, K., Zhang, Y., Chang, S., Yang, X., Hasegawa-Johnson, M.: AutoVC: zero-shot voice style transfer with only autoencoder loss. In: International Conference on Machine Learning, pp. 5210–5219. PMLR (2019)
Singh, A.K., Singh, P.: Detection of AI-synthesized speech using cepstral & bispectral statistics. In: 2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 412–417. IEEE (2021)
Todisco, M., Delgado, H., Evans, N.: Constant q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)
Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)
Woubie, A., Bäckström, T.: Voice-quality features for replay attack detection (2022)
Wu, Z., et al.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Wu, Z., Xiao, X., Chng, E.S., Li, H.: Synthetic speech detection using temporal modulation feature. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7234–7238. IEEE (2013)
Yamamoto, R., Song, E., Kim, J.M.: Parallel WaveGAN: a fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6199–6203. IEEE (2020)
Zen, H., et al.: Libritts: a corpus derived from librispeech for text-to-speech. In: Interspeech (2019). https://arxiv.org/abs/1904.02882
Acknowledgments
This work is funded by Ministry of Electronics and Information Technology (MeitY), Govt. of India under the project title “Fake Speech detection using Deep Learning Framework”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Kumar, D., Patil, P.K.V., Agarwal, A., Prasanna, S.R.M. (2022). Fake Speech Detection Using OpenSMILE Features. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-20980-2_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)