Modified Group Delay Function Using Different Spectral Smoothing Techniques for Voice Liveness Detection

Singh, Shrishti; Khoria, Kuldeep; Patil, Hemant A.

doi:10.1007/978-3-030-87802-3_58

Shrishti Singh¹⁰,
Kuldeep Khoria¹⁰ &
Hemant A. Patil¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

1640 Accesses
3 Citations

Abstract

Voice Liveness Detection (VLD) has emerged as a successful technique to detect spoofing attacks in Automatic Speaker Verification (ASV) system. Presence of pop noise in the speech signal of live speaker provides the basic cue to distinguish between genuine and spoofed speech. Pop noise is produced due to the spontaneous breathing while uttering a certain class of phonemes which has low frequency characteristics. Pop noise comes out as a burst at the lips which is captured by the ASV system (as the speaker and microphone are close enough), indicates the liveness of the speaker and provides the basis of VLD. Pop noise characteristics is absent in spoofed speech as generally the original speaker and attacker’s recording device are far apart. In this context, we explore relative significance of the phase information present to detect pop noise by utilizing phase-based feature, i.e., modified group delay function. Further, various spectral smoothing techniques have been analyzed, such as cepstral smoothing, spectral root, and linear prediction spectrum in order to enhance the spectral representation of the speech signals through modified group delay functions (MGDF). Better accuracy of 80.13% on development set and 69.79% on evaluation set is obtained when spectral root smoothing is employed in MGDF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ai, Y., Ling, Z.H.: Knowledge-and-data-driven amplitude spectrum prediction for hierarchical neural vocoders. In: INTERSPEECH, pp. 190–194. Shanghai, China (2020)
Google Scholar
Akimoto, K., Liew, S.P., Mishima, S., Mizushima, R., Lee, K.A.: Poco: a voice spoofing and liveness detection corpus based on pop noise. In: INTERSPEECH, Shanghai, China, pp. 1081–1085 (2020)
Google Scholar
Atal, B.S., Hanauer, S.L.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoustical Soc. Am. (JASA) 50(2B), 637–655 (1971)
Article Google Scholar
Delgado, H., et al.: ASVspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In: Odyssey 2018 The Speaker and Language Recognition Workshop. Les Sables d’Olonne, France, pp. 26–29, June 2018
Google Scholar
Ding, S., Zhao, G., Gutierrez-Osuna, R.: Improving the speaker identity of non-parallel many-to-many voice conversion with adversarial speaker recognition. In: INTERSPEECH, pp. 776–780. Shanghai, China (2020)
Google Scholar
Hautamäki, R.G., Kinnunen, T., Hautamäki, V., Laukkanen, A.M.: Comparison of human listeners and speaker verification systems using voice mimicry data. TARGET 4000, 5000 (2014)
Google Scholar
Hegde, R.M., Murthy, H.A., Gadde, V.R.R.: Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(1), 190–202 (2007)
Google Scholar
chun Hsu, P., yi Lee, H.: WG-WaveNet: real-time high-fidelity speech synthesis without GPU. In: INTERSPEECH. Shanghai, China, pp. 210–214 (2020)
Google Scholar
Lim, J.: Spectral root homomorphic deconvolution system. IEEE Trans. Acoustics Speech Signal Process. 27(3), 223–233 (1979)
Article Google Scholar
Lorenzo-Trueba, J., et al.: The voice conversion challenge 2018: promoting development of parallel and nonparallel methods. In: Proceedings of the Odyssey 2018 The Speaker and Language Recognition Workshop, pp. 195–202 (2018). http://dx.doi.org/10.21437/Odyssey.2018-28
Makhoul, J.: Spectral analysis of speech by linear prediction. IEEE Trans. Audio Electroacoust. 21(3), 140–148 (1973)
Article Google Scholar
Mochizuki, S., Shiota, S., Kiya, H.: Voice liveness detection using phoneme-based pop-noise detector for speaker verifcation. In: Odyssey 2018 The Speaker and Language Recognition Workshop. ISCA, Les Sables d’Olonne, pp. 233–239 (2018)
Google Scholar
Murthy, H.A., Gadde, V.: The modified group delay function and its application to phoneme recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP’03), vol. 1, p. I-68. Hong Kong, China (2003)
Google Scholar
Murthy, H.A., Yegnanarayana, B.: Group delay functions and its applications in speech technology. Sadhana 36(5), 745–782 (2011)
Article Google Scholar
Oppenheim, A.V., Schafer, R.W.: Digital signal processing(book). Research supported by the Massachusetts Institute of Technology, Bell Telephone Laboratories, and Guggenheim Foundation. Englewood Cliffs, N. J., Prentice-Hall Inc, 1975, p. 598 (1975)
Google Scholar
Oppenheim, A.V., Willsky, A.S., Hamid, S.: Signals and systems, processing series, 2nd edition (1997)
Google Scholar
Quatieri, T.F.: Discrete-time Speech Signal Processing: Principles and Practice. 2nd Edition, Pearson Education India, Chennai (2006)
Google Scholar
Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., Matsui, T.: Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. In: INTERSPEECH, Dresden, Germany, pp. 239–243 (2015)
Google Scholar
Srinivas, K., Das, R.K., Patil, H.A.: Combining phase-based features for replay spoof detection system. In: 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 151–155. Taiwan, China (2018)
Google Scholar
Srinivas, K., Patil, H.A.: Relative phase shift features for replay spoof detection system. In: Proceddings of the Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), pp. 1–5. New Delhi, India (2018)
Google Scholar
Tapkir, P., Patil, H.A.: Novel empirical mode decomposition cepstral features for replay spoof detection. In: INTERSPEECH. Hyderabad, India, pp. 721–725 (2018)
Google Scholar
Tian, Q., Zhang, Z., Lu, H., Chen, L.H., Liu, S.: FeatherWave: an efficient high-fidelity neural vocoder with multi-band linear prediction. In: INTERSPEECH, pp. 195–199. Shanghai, China (2020)
Google Scholar
Todisco, M., et al.: Asvspoof 2019: Future horizons in spoofed and fake audio detection. arXivreprint arXiv:1904.05441, pp. 1008–1012 (2019)
Tribolet, J.: A new phase unwrapping algorithm. IEEE Trans. Acoustics Speech Sign. Process. 25(2), 170–177 (1977)
Article Google Scholar
Wang, Q., et al.: Voicepop: a pop noise based anti-spoofing system for voice authentication on smartphones. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, pp. 2062–2070 (2019)
Google Scholar
Wu, Z., et al.: ASVSpoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: INTERSPEECH, pp. 2037–2041. Dresden, Germany (2015)
Google Scholar
Yang, J., Wang, H., Das, R.K., Qian, Y.: Modified magnitude-phase spectrum information for spoofing detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, p. 1 (2021). https://doi.org/10.1109/TASLP.2021.3060810

Download references

Acknowledgments

The authors would like to thank the authorities at DA-IICT, Gandhinagar for providing resources and kind support towards the completion of this research work.

Author information

Authors and Affiliations

Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, Gujarat, India
Shrishti Singh, Kuldeep Khoria & Hemant A. Patil

Authors

Shrishti Singh
View author publications
You can also search for this author in PubMed Google Scholar
Kuldeep Khoria
View author publications
You can also search for this author in PubMed Google Scholar
Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shrishti Singh .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, S., Khoria, K., Patil, H.A. (2021). Modified Group Delay Function Using Different Spectral Smoothing Techniques for Voice Liveness Detection. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_58

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_58
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics