Skip to main content

Playback Speech Detection Application Based on Cepstrum Feature

  • Conference paper
  • First Online:
  • 1140 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1179))

Abstract

With the popularity of various portable recording devices, playback speech has become one of the most important means of attack in the speaker authentication system. By comparing with the original speech data, the difference in the high-frequency layer, and the playback speech is also different in the low-frequency layer due to the different recording equipment. According to this finding, a detection algorithm was presented to extract representative data. In the high frequency layer, the inverse-Mel filters (I-Mel) is used to extract speaker eigenvector sequences. In the low frequency layer, linear filters (Linear) is combined with Mel filters (Mel) to avoid superposition of characteristic parameters. Multi-layer fusion to obtain L-M-I filter banks to form new cepstral features. The experimental results show that the method can detect playback speech effectively and the equal error rate is 2.63%. Compared with the traditional feature extraction methods (MFCC, CQCC, LFCC, IMFCC), the equal error rate decreases by 12.79%, 9.61%, 4.45% and 3.28% respectively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zhu, D., Ma, B., Li, H.: Speaker verification with feature-space MAPLR parameters. IEEE Trans. Audio Speech Lang. Process. 19(3), 505–515 (2011)

    Article  Google Scholar 

  2. Wu, Z., Evans, N., Kinnunen, T., et al.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)

    Article  Google Scholar 

  3. Wu, Z., Yamagishi, J., Kinnunen, T., et al.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. IEEE J. Sel. Top. Sig. Process. 11(4), 588–604 (2017)

    Article  Google Scholar 

  4. Albeshri, A., Thayananthan, V., et al.: Analytical techniques for decision making on information security for big data breaches. Int. J. Inf. Technol. Decis. Mak. (IJITDM) 17(2), 527–545 (2018)

    Article  Google Scholar 

  5. Shang, W., Stevenson, M.: Score normalization in playback attack detection. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing, Dallas, TX, USA, pp. 1678–1681. IEEE Press (2010)

    Google Scholar 

  6. Gałka, J., Grzywacz, M., Samborski, R.: Playback attack detection for text-dependent speaker verification over telephone channels. Speech Commun. 67, 143–153 (2015)

    Article  Google Scholar 

  7. Todisco, M., Delgado, H., Evans, N.: A new feature for automatic speaker verification anti-spoofing: constant q cepstral coefficients. In: Odyssey 2016 - The Speaker and Language Recognition Workshop. ISCA Press, Bilbao, Spain (2016)

    Google Scholar 

  8. Todisco, M., Delgado, H., Evans, N.: Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)

    Article  Google Scholar 

  9. Nagarsheth, P., Khoury, E., Patil, K., Garland, M.: Replay attack detection using DNN for channel discrimination. In: INTERSPEECH, Stockholm, Sweden, pp. 97–101 (2017)

    Google Scholar 

  10. Chen, Z., Xie, Z., Zhang, W., Xu, X.: ResNet and model fusion for automatic spoofing detection. In: INTERSPEECH 2017, Stockholm, Sweden, pp. 102–106 (2017)

    Google Scholar 

  11. Cai, W., Cai, D., Liu, W., Li, G., Li, M.: Countermeasures for automatic speaker verification replay spoofing attack: on data augmentation, feature representation, classification and fusion. In: INTERSPEECH, Stockholm, Sweden, pp. 17–21 (2017)

    Google Scholar 

  12. Patil, H.A., Kamble, M.R., Patel, T.B., Soni, M.: Novel variable length Teager energy separation based instantaneous frequency features for replay detection. In: INTERSPEECH, Stockholm, Sweden, pp. 12–16 (2017)

    Google Scholar 

  13. Alluri, K.R., Achanta, S., Kadiri, S.R., Gangashetty, S.V., Vuppala, A.K.: SFF anti-spoofer: IIIT-H submission for automatic speaker verification spoofing and countermeasures challenge 2017. In: INTERSPEECH, Stockholm, Sweden, pp. 107–111 (2017)

    Google Scholar 

  14. Witkowski, M., Kacprzak, S., Zelasko, P., et al.: Audio replay attack detection using high-frequency features. In: INTERSPEECH, Stockholm, Sweden, pp. 27–31 (2017)

    Google Scholar 

  15. Xu, Z., Hu, H.: Projection models for intuitionistic fuzzy multiple attribute decision making. Int. J. Inf. Technol. Decis. Mak. 09(02), 267–280 (2010)

    Article  Google Scholar 

  16. Mcdermott, J.H., Schemitsch, M., Simoncelli, E.P.: Summary statistics in auditory perception. Nat. Neurosci. 16(4), 493–498 (2013)

    Article  Google Scholar 

  17. Hoshen, Y., Weiss, R.J., Wilson, K.W.: Speech acoustic modeling from raw multichannel waveforms. In: ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2015)

    Google Scholar 

  18. Jelil, S., Das, R.K., Prasanna, S.M., Sinha, R.: Spoof detection using source, instantaneous frequency and cepstral features. In: INTERSPEECH, Stockholm, Sweden, pp. 22–26 (2017)

    Google Scholar 

  19. Rouba, B., Bahloul, S.N.: A multicriteria clustering approach based on similarity indices and clustering ensemble techniques. Int. J. Inf. Technol. Decis. Mak. 13(04), 811–837 (2014)

    Article  Google Scholar 

  20. Witkowski, M., Kacprzak, S., Zelasko, P., et al.: Audio replay attack detection using high-frequency features. In: Interspeech, pp. 27–31(2017)

    Google Scholar 

  21. Nematollahi, M.A., Al-Haddad, S.A.R.: Distant speaker recognition: an overview. Int. J. Humanoid Rob. 13(02), 45 (2016)

    Google Scholar 

  22. Font, R., Espín, J.M., Cano, M.J.: Experimental analysis of features for replay attack detection — results on the ASVspoof2017 challenge. In: Interspeech 2017 (2017)

    Google Scholar 

  23. Tian, X., Wu, Z., Xiao, X., et al.: Spoofing detection from a feature representation perspective. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2119–2123. IEEE Press, Washington (2016)

    Google Scholar 

Download references

Acknowledgements

This work was funded by the Natural Science Foundation of Jiangsu Province (Project No. BK20150987) and the support of the College of Information Engineering, Nanjing University of Finance & Economics. In addition, authors would like to thank the database provided by the ASVspoof2017 challenge.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ye Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, J., Jiang, Y. (2020). Playback Speech Detection Application Based on Cepstrum Feature. In: He, J., et al. Data Science. ICDS 2019. Communications in Computer and Information Science, vol 1179. Springer, Singapore. https://doi.org/10.1007/978-981-15-2810-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-2810-1_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-2809-5

  • Online ISBN: 978-981-15-2810-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics