Playback Speech Detection Application Based on Cepstrum Feature

Zhou, Jing; Jiang, Ye

doi:10.1007/978-981-15-2810-1_24

Jing Zhou¹⁵ &
Ye Jiang¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1179))

Included in the following conference series:

International Conference on Data Service

Abstract

With the popularity of various portable recording devices, playback speech has become one of the most important means of attack in the speaker authentication system. By comparing with the original speech data, the difference in the high-frequency layer, and the playback speech is also different in the low-frequency layer due to the different recording equipment. According to this finding, a detection algorithm was presented to extract representative data. In the high frequency layer, the inverse-Mel filters (I-Mel) is used to extract speaker eigenvector sequences. In the low frequency layer, linear filters (Linear) is combined with Mel filters (Mel) to avoid superposition of characteristic parameters. Multi-layer fusion to obtain L-M-I filter banks to form new cepstral features. The experimental results show that the method can detect playback speech effectively and the equal error rate is 2.63%. Compared with the traditional feature extraction methods (MFCC, CQCC, LFCC, IMFCC), the equal error rate decreases by 12.79%, 9.61%, 4.45% and 3.28% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Discriminative features based on modified log magnitude spectrum for playback speech detection

Article Open access 07 April 2020

Replay spoofing countermeasures using high spectro-temporal resolution features

Article 20 February 2019

A Replay Voice Detection Algorithm Based on Multi-feature Fusion

References

Zhu, D., Ma, B., Li, H.: Speaker verification with feature-space MAPLR parameters. IEEE Trans. Audio Speech Lang. Process. 19(3), 505–515 (2011)
Article Google Scholar
Wu, Z., Evans, N., Kinnunen, T., et al.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)
Article Google Scholar
Wu, Z., Yamagishi, J., Kinnunen, T., et al.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. IEEE J. Sel. Top. Sig. Process. 11(4), 588–604 (2017)
Article Google Scholar
Albeshri, A., Thayananthan, V., et al.: Analytical techniques for decision making on information security for big data breaches. Int. J. Inf. Technol. Decis. Mak. (IJITDM) 17(2), 527–545 (2018)
Article Google Scholar
Shang, W., Stevenson, M.: Score normalization in playback attack detection. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing, Dallas, TX, USA, pp. 1678–1681. IEEE Press (2010)
Google Scholar
Gałka, J., Grzywacz, M., Samborski, R.: Playback attack detection for text-dependent speaker verification over telephone channels. Speech Commun. 67, 143–153 (2015)
Article Google Scholar
Todisco, M., Delgado, H., Evans, N.: A new feature for automatic speaker verification anti-spoofing: constant q cepstral coefficients. In: Odyssey 2016 - The Speaker and Language Recognition Workshop. ISCA Press, Bilbao, Spain (2016)
Google Scholar
Todisco, M., Delgado, H., Evans, N.: Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)
Article Google Scholar
Nagarsheth, P., Khoury, E., Patil, K., Garland, M.: Replay attack detection using DNN for channel discrimination. In: INTERSPEECH, Stockholm, Sweden, pp. 97–101 (2017)
Google Scholar
Chen, Z., Xie, Z., Zhang, W., Xu, X.: ResNet and model fusion for automatic spoofing detection. In: INTERSPEECH 2017, Stockholm, Sweden, pp. 102–106 (2017)
Google Scholar
Cai, W., Cai, D., Liu, W., Li, G., Li, M.: Countermeasures for automatic speaker verification replay spoofing attack: on data augmentation, feature representation, classification and fusion. In: INTERSPEECH, Stockholm, Sweden, pp. 17–21 (2017)
Google Scholar
Patil, H.A., Kamble, M.R., Patel, T.B., Soni, M.: Novel variable length Teager energy separation based instantaneous frequency features for replay detection. In: INTERSPEECH, Stockholm, Sweden, pp. 12–16 (2017)
Google Scholar
Alluri, K.R., Achanta, S., Kadiri, S.R., Gangashetty, S.V., Vuppala, A.K.: SFF anti-spoofer: IIIT-H submission for automatic speaker verification spoofing and countermeasures challenge 2017. In: INTERSPEECH, Stockholm, Sweden, pp. 107–111 (2017)
Google Scholar
Witkowski, M., Kacprzak, S., Zelasko, P., et al.: Audio replay attack detection using high-frequency features. In: INTERSPEECH, Stockholm, Sweden, pp. 27–31 (2017)
Google Scholar
Xu, Z., Hu, H.: Projection models for intuitionistic fuzzy multiple attribute decision making. Int. J. Inf. Technol. Decis. Mak. 09(02), 267–280 (2010)
Article Google Scholar
Mcdermott, J.H., Schemitsch, M., Simoncelli, E.P.: Summary statistics in auditory perception. Nat. Neurosci. 16(4), 493–498 (2013)
Article Google Scholar
Hoshen, Y., Weiss, R.J., Wilson, K.W.: Speech acoustic modeling from raw multichannel waveforms. In: ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2015)
Google Scholar
Jelil, S., Das, R.K., Prasanna, S.M., Sinha, R.: Spoof detection using source, instantaneous frequency and cepstral features. In: INTERSPEECH, Stockholm, Sweden, pp. 22–26 (2017)
Google Scholar
Rouba, B., Bahloul, S.N.: A multicriteria clustering approach based on similarity indices and clustering ensemble techniques. Int. J. Inf. Technol. Decis. Mak. 13(04), 811–837 (2014)
Article Google Scholar
Witkowski, M., Kacprzak, S., Zelasko, P., et al.: Audio replay attack detection using high-frequency features. In: Interspeech, pp. 27–31(2017)
Google Scholar
Nematollahi, M.A., Al-Haddad, S.A.R.: Distant speaker recognition: an overview. Int. J. Humanoid Rob. 13(02), 45 (2016)
Google Scholar
Font, R., Espín, J.M., Cano, M.J.: Experimental analysis of features for replay attack detection — results on the ASVspoof2017 challenge. In: Interspeech 2017 (2017)
Google Scholar
Tian, X., Wu, Z., Xiao, X., et al.: Spoofing detection from a feature representation perspective. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2119–2123. IEEE Press, Washington (2016)
Google Scholar

Download references

Acknowledgements

This work was funded by the Natural Science Foundation of Jiangsu Province (Project No. BK20150987) and the support of the College of Information Engineering, Nanjing University of Finance & Economics. In addition, authors would like to thank the database provided by the ASVspoof2017 challenge.

Author information

Authors and Affiliations

College of Information Engineering, Nanjing University of Finance and Economics, Nanjing, 210023, Jiangsu, China
Jing Zhou & Ye Jiang

Authors

Jing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ye Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ye Jiang .

Editor information

Editors and Affiliations

Swinburne University of Technology, Melbourne, VIC, Australia
Jing He
University of Illinois at Chicago, Chicago, USA
Philip S. Yu
College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA
Yong Shi
Research Institute of Extenics and Innovation Methods, Guangdong University of Technology, Guangzhou, China
Xingsen Li
Ningbo University, Ningbo, China
Zhijun Xie
Deakin University, Burwood, VIC, Australia
Guangyan Huang
Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China
Jie Cao
Nanjing University of Posts and Telecommunications, Nanjing, China
Fu Xiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, J., Jiang, Y. (2020). Playback Speech Detection Application Based on Cepstrum Feature. In: He, J., et al. Data Science. ICDS 2019. Communications in Computer and Information Science, vol 1179. Springer, Singapore. https://doi.org/10.1007/978-981-15-2810-1_24

Download citation

DOI: https://doi.org/10.1007/978-981-15-2810-1_24
Published: 02 February 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2809-5
Online ISBN: 978-981-15-2810-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics