Skip to main content

Modified Group Delay Function Using Different Spectral Smoothing Techniques for Voice Liveness Detection

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

Abstract

Voice Liveness Detection (VLD) has emerged as a successful technique to detect spoofing attacks in Automatic Speaker Verification (ASV) system. Presence of pop noise in the speech signal of live speaker provides the basic cue to distinguish between genuine and spoofed speech. Pop noise is produced due to the spontaneous breathing while uttering a certain class of phonemes which has low frequency characteristics. Pop noise comes out as a burst at the lips which is captured by the ASV system (as the speaker and microphone are close enough), indicates the liveness of the speaker and provides the basis of VLD. Pop noise characteristics is absent in spoofed speech as generally the original speaker and attacker’s recording device are far apart. In this context, we explore relative significance of the phase information present to detect pop noise by utilizing phase-based feature, i.e., modified group delay function. Further, various spectral smoothing techniques have been analyzed, such as cepstral smoothing, spectral root, and linear prediction spectrum in order to enhance the spectral representation of the speech signals through modified group delay functions (MGDF). Better accuracy of 80.13% on development set and 69.79% on evaluation set is obtained when spectral root smoothing is employed in MGDF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ai, Y., Ling, Z.H.: Knowledge-and-data-driven amplitude spectrum prediction for hierarchical neural vocoders. In: INTERSPEECH, pp. 190–194. Shanghai, China (2020)

    Google Scholar 

  2. Akimoto, K., Liew, S.P., Mishima, S., Mizushima, R., Lee, K.A.: Poco: a voice spoofing and liveness detection corpus based on pop noise. In: INTERSPEECH, Shanghai, China, pp. 1081–1085 (2020)

    Google Scholar 

  3. Atal, B.S., Hanauer, S.L.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoustical Soc. Am. (JASA) 50(2B), 637–655 (1971)

    Article  Google Scholar 

  4. Delgado, H., et al.: ASVspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In: Odyssey 2018 The Speaker and Language Recognition Workshop. Les Sables d’Olonne, France, pp. 26–29, June 2018

    Google Scholar 

  5. Ding, S., Zhao, G., Gutierrez-Osuna, R.: Improving the speaker identity of non-parallel many-to-many voice conversion with adversarial speaker recognition. In: INTERSPEECH, pp. 776–780. Shanghai, China (2020)

    Google Scholar 

  6. Hautamäki, R.G., Kinnunen, T., Hautamäki, V., Laukkanen, A.M.: Comparison of human listeners and speaker verification systems using voice mimicry data. TARGET 4000, 5000 (2014)

    Google Scholar 

  7. Hegde, R.M., Murthy, H.A., Gadde, V.R.R.: Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(1), 190–202 (2007)

    Google Scholar 

  8. chun Hsu, P., yi Lee, H.: WG-WaveNet: real-time high-fidelity speech synthesis without GPU. In: INTERSPEECH. Shanghai, China, pp. 210–214 (2020)

    Google Scholar 

  9. Lim, J.: Spectral root homomorphic deconvolution system. IEEE Trans. Acoustics Speech Signal Process. 27(3), 223–233 (1979)

    Article  Google Scholar 

  10. Lorenzo-Trueba, J., et al.: The voice conversion challenge 2018: promoting development of parallel and nonparallel methods. In: Proceedings of the Odyssey 2018 The Speaker and Language Recognition Workshop, pp. 195–202 (2018). http://dx.doi.org/10.21437/Odyssey.2018-28

  11. Makhoul, J.: Spectral analysis of speech by linear prediction. IEEE Trans. Audio Electroacoust. 21(3), 140–148 (1973)

    Article  Google Scholar 

  12. Mochizuki, S., Shiota, S., Kiya, H.: Voice liveness detection using phoneme-based pop-noise detector for speaker verifcation. In: Odyssey 2018 The Speaker and Language Recognition Workshop. ISCA, Les Sables d’Olonne, pp. 233–239 (2018)

    Google Scholar 

  13. Murthy, H.A., Gadde, V.: The modified group delay function and its application to phoneme recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP’03), vol. 1, p. I-68. Hong Kong, China (2003)

    Google Scholar 

  14. Murthy, H.A., Yegnanarayana, B.: Group delay functions and its applications in speech technology. Sadhana 36(5), 745–782 (2011)

    Article  Google Scholar 

  15. Oppenheim, A.V., Schafer, R.W.: Digital signal processing(book). Research supported by the Massachusetts Institute of Technology, Bell Telephone Laboratories, and Guggenheim Foundation. Englewood Cliffs, N. J., Prentice-Hall Inc, 1975, p. 598 (1975)

    Google Scholar 

  16. Oppenheim, A.V., Willsky, A.S., Hamid, S.: Signals and systems, processing series, 2nd edition (1997)

    Google Scholar 

  17. Quatieri, T.F.: Discrete-time Speech Signal Processing: Principles and Practice. 2nd Edition, Pearson Education India, Chennai (2006)

    Google Scholar 

  18. Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., Matsui, T.: Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. In: INTERSPEECH, Dresden, Germany, pp. 239–243 (2015)

    Google Scholar 

  19. Srinivas, K., Das, R.K., Patil, H.A.: Combining phase-based features for replay spoof detection system. In: 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 151–155. Taiwan, China (2018)

    Google Scholar 

  20. Srinivas, K., Patil, H.A.: Relative phase shift features for replay spoof detection system. In: Proceddings of the Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), pp. 1–5. New Delhi, India (2018)

    Google Scholar 

  21. Tapkir, P., Patil, H.A.: Novel empirical mode decomposition cepstral features for replay spoof detection. In: INTERSPEECH. Hyderabad, India, pp. 721–725 (2018)

    Google Scholar 

  22. Tian, Q., Zhang, Z., Lu, H., Chen, L.H., Liu, S.: FeatherWave: an efficient high-fidelity neural vocoder with multi-band linear prediction. In: INTERSPEECH, pp. 195–199. Shanghai, China (2020)

    Google Scholar 

  23. Todisco, M., et al.: Asvspoof 2019: Future horizons in spoofed and fake audio detection. arXivreprint arXiv:1904.05441, pp. 1008–1012 (2019)

  24. Tribolet, J.: A new phase unwrapping algorithm. IEEE Trans. Acoustics Speech Sign. Process. 25(2), 170–177 (1977)

    Article  Google Scholar 

  25. Wang, Q., et al.: Voicepop: a pop noise based anti-spoofing system for voice authentication on smartphones. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, pp. 2062–2070 (2019)

    Google Scholar 

  26. Wu, Z., et al.: ASVSpoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: INTERSPEECH, pp. 2037–2041. Dresden, Germany (2015)

    Google Scholar 

  27. Yang, J., Wang, H., Das, R.K., Qian, Y.: Modified magnitude-phase spectrum information for spoofing detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, p. 1 (2021). https://doi.org/10.1109/TASLP.2021.3060810

Download references

Acknowledgments

The authors would like to thank the authorities at DA-IICT, Gandhinagar for providing resources and kind support towards the completion of this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shrishti Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, S., Khoria, K., Patil, H.A. (2021). Modified Group Delay Function Using Different Spectral Smoothing Techniques for Voice Liveness Detection. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87802-3_58

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87801-6

  • Online ISBN: 978-3-030-87802-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics