Abstract
Voice Liveness Detection (VLD) has emerged as a successful technique to detect spoofing attacks in Automatic Speaker Verification (ASV) system. Presence of pop noise in the speech signal of live speaker provides the basic cue to distinguish between genuine and spoofed speech. Pop noise is produced due to the spontaneous breathing while uttering a certain class of phonemes which has low frequency characteristics. Pop noise comes out as a burst at the lips which is captured by the ASV system (as the speaker and microphone are close enough), indicates the liveness of the speaker and provides the basis of VLD. Pop noise characteristics is absent in spoofed speech as generally the original speaker and attacker’s recording device are far apart. In this context, we explore relative significance of the phase information present to detect pop noise by utilizing phase-based feature, i.e., modified group delay function. Further, various spectral smoothing techniques have been analyzed, such as cepstral smoothing, spectral root, and linear prediction spectrum in order to enhance the spectral representation of the speech signals through modified group delay functions (MGDF). Better accuracy of 80.13% on development set and 69.79% on evaluation set is obtained when spectral root smoothing is employed in MGDF.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ai, Y., Ling, Z.H.: Knowledge-and-data-driven amplitude spectrum prediction for hierarchical neural vocoders. In: INTERSPEECH, pp. 190–194. Shanghai, China (2020)
Akimoto, K., Liew, S.P., Mishima, S., Mizushima, R., Lee, K.A.: Poco: a voice spoofing and liveness detection corpus based on pop noise. In: INTERSPEECH, Shanghai, China, pp. 1081–1085 (2020)
Atal, B.S., Hanauer, S.L.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoustical Soc. Am. (JASA) 50(2B), 637–655 (1971)
Delgado, H., et al.: ASVspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In: Odyssey 2018 The Speaker and Language Recognition Workshop. Les Sables d’Olonne, France, pp. 26–29, June 2018
Ding, S., Zhao, G., Gutierrez-Osuna, R.: Improving the speaker identity of non-parallel many-to-many voice conversion with adversarial speaker recognition. In: INTERSPEECH, pp. 776–780. Shanghai, China (2020)
Hautamäki, R.G., Kinnunen, T., Hautamäki, V., Laukkanen, A.M.: Comparison of human listeners and speaker verification systems using voice mimicry data. TARGET 4000, 5000 (2014)
Hegde, R.M., Murthy, H.A., Gadde, V.R.R.: Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(1), 190–202 (2007)
chun Hsu, P., yi Lee, H.: WG-WaveNet: real-time high-fidelity speech synthesis without GPU. In: INTERSPEECH. Shanghai, China, pp. 210–214 (2020)
Lim, J.: Spectral root homomorphic deconvolution system. IEEE Trans. Acoustics Speech Signal Process. 27(3), 223–233 (1979)
Lorenzo-Trueba, J., et al.: The voice conversion challenge 2018: promoting development of parallel and nonparallel methods. In: Proceedings of the Odyssey 2018 The Speaker and Language Recognition Workshop, pp. 195–202 (2018). http://dx.doi.org/10.21437/Odyssey.2018-28
Makhoul, J.: Spectral analysis of speech by linear prediction. IEEE Trans. Audio Electroacoust. 21(3), 140–148 (1973)
Mochizuki, S., Shiota, S., Kiya, H.: Voice liveness detection using phoneme-based pop-noise detector for speaker verifcation. In: Odyssey 2018 The Speaker and Language Recognition Workshop. ISCA, Les Sables d’Olonne, pp. 233–239 (2018)
Murthy, H.A., Gadde, V.: The modified group delay function and its application to phoneme recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP’03), vol. 1, p. I-68. Hong Kong, China (2003)
Murthy, H.A., Yegnanarayana, B.: Group delay functions and its applications in speech technology. Sadhana 36(5), 745–782 (2011)
Oppenheim, A.V., Schafer, R.W.: Digital signal processing(book). Research supported by the Massachusetts Institute of Technology, Bell Telephone Laboratories, and Guggenheim Foundation. Englewood Cliffs, N. J., Prentice-Hall Inc, 1975, p. 598 (1975)
Oppenheim, A.V., Willsky, A.S., Hamid, S.: Signals and systems, processing series, 2nd edition (1997)
Quatieri, T.F.: Discrete-time Speech Signal Processing: Principles and Practice. 2nd Edition, Pearson Education India, Chennai (2006)
Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., Matsui, T.: Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. In: INTERSPEECH, Dresden, Germany, pp. 239–243 (2015)
Srinivas, K., Das, R.K., Patil, H.A.: Combining phase-based features for replay spoof detection system. In: 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 151–155. Taiwan, China (2018)
Srinivas, K., Patil, H.A.: Relative phase shift features for replay spoof detection system. In: Proceddings of the Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), pp. 1–5. New Delhi, India (2018)
Tapkir, P., Patil, H.A.: Novel empirical mode decomposition cepstral features for replay spoof detection. In: INTERSPEECH. Hyderabad, India, pp. 721–725 (2018)
Tian, Q., Zhang, Z., Lu, H., Chen, L.H., Liu, S.: FeatherWave: an efficient high-fidelity neural vocoder with multi-band linear prediction. In: INTERSPEECH, pp. 195–199. Shanghai, China (2020)
Todisco, M., et al.: Asvspoof 2019: Future horizons in spoofed and fake audio detection. arXivreprint arXiv:1904.05441, pp. 1008–1012 (2019)
Tribolet, J.: A new phase unwrapping algorithm. IEEE Trans. Acoustics Speech Sign. Process. 25(2), 170–177 (1977)
Wang, Q., et al.: Voicepop: a pop noise based anti-spoofing system for voice authentication on smartphones. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, pp. 2062–2070 (2019)
Wu, Z., et al.: ASVSpoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: INTERSPEECH, pp. 2037–2041. Dresden, Germany (2015)
Yang, J., Wang, H., Das, R.K., Qian, Y.: Modified magnitude-phase spectrum information for spoofing detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, p. 1 (2021). https://doi.org/10.1109/TASLP.2021.3060810
Acknowledgments
The authors would like to thank the authorities at DA-IICT, Gandhinagar for providing resources and kind support towards the completion of this research work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Singh, S., Khoria, K., Patil, H.A. (2021). Modified Group Delay Function Using Different Spectral Smoothing Techniques for Voice Liveness Detection. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_58
Download citation
DOI: https://doi.org/10.1007/978-3-030-87802-3_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)