Skip to main content

Advertisement

Log in

Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection

  • SI: ISCSLP 2018 -- invitation only
  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In this paper, we propose the combination of Amplitude Modulation and Frequency Modulation (AM-FM) features for replay Spoof Speech Detection (SSD) task. The AM components are known to be affected by noise (in this case, due to replay mechanism). In particular, we exploit this damage in AM component to corresponding Instantaneous Frequency (IF) for SSD task. Thus, the novelty of proposed Amplitude Weighted Frequency Cepstral Coefficients (AWFCC) feature set lies in using frequency components along with squared weighted amplitude components that are degraded due to replay noise. The AWFCC feature set contains the information of both AM and FM components together and hence, gave discriminatory information in the spectral characteristics. The experiments were performed on publicly available ASVspoof 2017 challenge version 1.0 and 2.0 databases using AWFCC feature set. We have compared results of proposed feature set with the other state-of-the-art feature set, such as Constant Q Cepstral Coefficients (CQCC), Linear Frequency Cepstral Coefficients (LFCC), Mel Frequency Cepstral Coefficients (MFCC) and using a simple Gaussian Mixture Model (GMM) classifier. The individual performance of AWFCC feature set obtained lower % EER than the other feature sets on both version 1.0 and 2.0 databases. Furthermore, we used score-level fusion in order to obtain the possible complementary information of two feature sets to reduce the % EER further. To that effect, the score-level fusion of CQCC and AWFCC feature sets gave 5.75 % and 10.42 % EER on development and evaluation sets, respectively, of ASVspoof 2017 version 2.0 database. Moreover, for evaluation dataset, we have also studied the performance of proposed feature set on different Replay Configurations (RC), namely, acoustic environments, playback, and recording devices. For all the levels of threat conditions (i.e., low, medium, and high) to the ASV system, the proposed feature set performed better compared to the existing state-of-the-art feature sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16

Similar content being viewed by others

References

  1. Zen, H., Tokuda, K., & Black, A.W. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11), 1039–1064.

    Article  Google Scholar 

  2. Stylianou, Y., & transformation, V. (2009). A survey. In IEEE international conference on acoustics, speech and signal processing, (ICASSP), Taipei, Taiwan, China (pp. 3585–3588).

  3. Alegre, F.R., Janicki, A., & Evans, N. (2014). Re-assessing the threat of replay spoofing attacks against automatic speaker verification. In IEEE international conference of the biometrics special interest group (BIOSIG), Darmstadt, Germany (pp. 1–6).

  4. Kinnunen, T., Sahidullah, M.D., Falcone, M., Costantini, L., Hautamaki, R.G., Thomsen, D.A.L., Sarkar, A.K., Tan, Z.H., Delgado, H., Todisco, M., & et al. (2017). Reddots replayed: a new replay spoofing attack corpus for text-dependent speaker verification research. In IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, Louisiana, USA (pp. 5395–5399).

  5. Nagarsheth, P., Khoury, E., Patil, K., & Garland, M. (2017). Replay attack detection using DNN for channel discrimination. In INTERSPEECH, Stockholm, Sweden (pp. 97–101).

  6. Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015). Spoofing and countermeasures for speaker verification: a survey. Speech Communication, 66, 130–153.

    Article  Google Scholar 

  7. Madhu, R., Sailor, H.B., Patil, H.A., & Li, H. (2019). Advances in anti-spoofing: from the perspective of asvspoof challenges. In APSIPA transactions on signal and information processing (in press).

  8. Paul, A., Das, R.K., Sinha, R., & Prasanna, S.R.M. (2016). Countermeasure to handle replay attacks in practical speaker verification systems. In IEEE international conference on signal processing and communications (SPCOM), Bengaluru, India (pp. 1–5).

  9. Korshunov, P., Marcel, S., Muckenhirn, H., Gonçalves, A.R. , Mello, A.G.S., Violato, R.P.V., Simões, F.O., Neto, M.U., de Assis Angeloni, M., Stuchi, J.A., & et al. (2016). Overview of BTAS 2016 speaker anti-spoofing competition. In IEEE international conference on biometrics theory, applications and systems (BTAS), Niagara Falls, New York, USA (pp. 1–6).

  10. Wu, Z., Gao, S., Cling, E.S., & Li, H. (2014). A study on replay attack and anti-spoofing for text-dependent speaker verification. In IEEE Asia-Pacific signal and information processing association, annual summit and conference (APSIPA), Chiang Mai, Thailand (pp. 1–5).

  11. Kinnunen, T., Sahidullah, M.D., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., & Lee, K.A. (2017). The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection. In INTERSPEECH, Stockholm, Sweden (pp. 1–6).

  12. Lee, K.A., Larcher, A., Wang, G., Kenny, P., Brümmer, N., van Leeuwen, D.A., Aronowitz, H., Kockmann, M., Vaquero, C., Ma, B., & et al. (2015). The RedDots data collection for speaker recognition. In INTERSPEECH, Dresden, Germany (pp. 2996–3000).

  13. Font, R., Espín, J.M., & Cano, M.J. (2017). Experimental analysis of features for replay attack detection results on the ASVspoof 2017 challenge. In INTERSPEECH, Stockholm, Sweden (pp. 7–11).

  14. Patil, H.A., Kamble, M.R., Patel, T.B., & Soni, M. (2017). Novel variable length Teager energy separation based instantaneous frequency features for replay detection. In INTERSPEECH, Stockholm, Sweden (pp. 12–16).

  15. Jelil, S., Das, R.K., Prasanna, S.R.M., & Sinha, R. (2017). Spoof detection using source, instantaneous frequency and cepstral features. In INTERSPEECH, Stockholm, Sweden (pp. 22–26).

  16. Alluri, K.N.R.K.R., Achanta, S., Kadiri, S.R., Gangashetty, S.V., & Vuppala, A.K. (2017). SFF anti-spoofer: IIIT-H submission for automatic speaker verification spoofing and countermeasures challenge 2017. In INTERSPEECH, Stockholm, Sweden (pp. 107–111).

  17. Witkowski, M., Kacprzak, S., Zelasko, P., Kowalczyk, K., & Gałka, J. (2017). Audio replay attack detection using high-frequency features. In INTERSPEECH, Stockholm, Sweden (pp. 27–31).

  18. Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., & Shchemelinin, V. (2017). Audio replay attack detection with deep learning frameworks. In INTERSPEECH, Stockholm, Sweden (pp. 82–86).

  19. Cai, W., Cai, D., Liu, W., Li, G., & Li, M. (2017). Countermeasures for automatic speaker verification replay spoofing attack: on data augmentation, feature representation, classification and fusion. In INTERSPEECH, Stockholm, Sweden (pp. 17–21).

  20. Chen, Z., Xie, Z., Zhang, W., & Xu, X. (2017). ResNet and model fusion for automatic spoofing detection. In INTERSPEECH 2017, Stockholm, Sweden (pp. 102–106).

  21. Kamble, M.R., & Patil, H.A. (2017). Novel energy separation based instantaneous frequency features for spoof speech detection. In IEEE European signal processing conference (EUSIPCO), Kos Island, Greece (pp. 106–110).

  22. Kamble, M.R., & Patil, H.A. (2017). Effectiveness of Mel scale-based ESA-IFCC features for classification of natural vs. spoofed speech. In Shankar, B.U., & et al. (Eds.) PReMI, Lecture Notes in Computer Sciance (LNCS) (pp. 308–316): Springer.

  23. Kamble, M.R., Tak, H., & Patil, H.A. (2018). Effectiveness of speech demodulation-based features for replay detection. In INTERSPEECH, Hyderabad, India (pp. 641–645).

  24. Kamble, M.R., & Patil, H.A. (2018). Novel variable length energy separation algorithm using instantaneous amplitude features for replay detection. In INTERSPEECH, Hyderabad, India (pp. 646–650).

  25. Kamble, M.R., & Patil, H.A. (2018). Novel amplitude weighted frequency modulation features for replay spoof detection. ISCSLP, Taipei, Taiwan, pp. 185–189.

  26. Kamble, M.R., Tak, H., Maddala, S.K., & Patil, H.A. (2018). Novel demodulation-based features using classifier-level fusion of GMM and CNN for replay detection. In ISCSLP. Taipei, Taiwan (pp. 334–338).

  27. Kamble, M.R., & Patil, H.A. (2019). Analysis of reverberation via Teager energy features for replay spoof speech detection. In IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton, UK (pp. 2607–2611).

  28. Dimitriadis, D., & Bocchieri, E. (2015). Use of micro-modulation features in large vocabulary continuous speech recognition tasks. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(8), 1348–1357.

    Article  Google Scholar 

  29. Maragos, P., Quatieri, T.F., & Kaiser, J.F. (1991). Speech nonlinearities, modulations, and energy operators. In IEEE international conference on acoustics, speech, and signal processing, (ICASSP), Toronto, Ontario, Canada (pp. 421–424).

  30. Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.

    Article  Google Scholar 

  31. Tak, H., & Patil, H.A. (2018). Novel linear frequency residual cepstral features for replay attack detection. In INTERSPEECH, Hyderabad, India (pp. 726–730).

  32. Mallat, S. (1999). A wavelet tour of signal processing, 2nd edn. New York: Academic Press.

    MATH  Google Scholar 

  33. Maragos, P., Kaiser, J.F., & Quatieri, T.F. (1993). On amplitude and frequency demodulation using energy operators. IEEE Transactions on Signal Processing, 41(4), 1532–1550.

    Article  Google Scholar 

  34. Maragos, P., Quatieri, T.F., & Kaiser, J.F. (1992). On separating amplitude from frequency modulations using energy operators. In International conference on acoustics, speech, and signal processing (ICASSP), San Francisco, California, USA, (Vol. 2 pp. 1–4).

  35. Quatieri, T.F. (2006). Discrete-time speech signal processing: principles and practice, 1st edn. India: Pearson Education.

    Google Scholar 

  36. Luo, H., Wang, Y., Poeppel, D., & Simon, J.Z. (2006). Concurrent encoding of frequency and amplitude modulation in human auditory cortex. Meg evidence, Journal of Neurophysiology, 96(5), 2712–2723.

    Article  Google Scholar 

  37. Potamianos, A., & Maragos, P. (1994). A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation. Signal Processing, 37(1), 95–120.

    Article  Google Scholar 

  38. Cohen, L., Assaleh, K., & Fineberg, A.D.A.M. (1992). Instantaneous bandwidth and formant bandwidth. In IEEE SP workshop on statistical signal and array processing (pp. 13–17).

  39. Potamianos, A., & Maragos, P. (1996). Speech formant frequency and bandwidth tracking using multiband energy demodulation. The Journal of the Acoustical Society of America (JASA), 99(6), 3795–3806.

    Article  Google Scholar 

  40. Li, D., & O’Shaughnessy, D. (2003). Speech processing – a dynamic and optimization-oriented approach, 1st edn. New York: Marcel Dekker Inc.

    Google Scholar 

  41. Kaiser, J.F. (1990). On a simple algorithm to calculate the energy of a signal. In International conference on acoustics, speech, and signal processing (ICASSP), Albuquerque, New Mexico, USA (pp. 381–384).

  42. Ramamohan Rao, K., & Yip, P. (2014). Discrete cosine transform: algorithms, advantages, Applications. New York: Academic Press.

    MATH  Google Scholar 

  43. Kamble, MR, Maddala, S.K., Tak, H., & Patil, H.A. (2019). Comparison of frame and utterance-level classifers for replay attack detection, accepted in Asia-Pacific signal and information processing association, annual summit and conference (APSIPA-ASC), Lanzhou, China.

  44. Delgado, H., Todisco, M., Md, S., Evans, N., Kinnunen, T., Lee, K.A., & Yamagishi, J. (2018). ASVspoof 2017 version 2.0: Meta-data analysis and baseline enhancements. In Odyssey the speaker and language recognition workshop, Les Sables d’Olonne, France (pp. 296–303).

  45. Objective Control for Talker Verification (OCTAVE), https://www.octave-project.eu/, Last Accessed 19 Jan 2019.

  46. Todisco, M., Delgado, H., & Evans, N. (2017). Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Computer Speech & Language, Elsevier, 45, 516–535.

    Article  Google Scholar 

  47. Rodomagoulakis, I., & Maragos, P. (2019). Improved frequency modulation features for multichannel distant speech recognition. IEEE Journal of Selected Topics in Signal Processing, 13(4), 841–849.

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the organizers of the special issue of Springer Journal of Signal Processing Systems for ISCSLP 2018 and also thank organizers of ASVspoof 2017 Challenge campaign. In addition, they also thank University Grants Commission (UGC) for providing Rajiv Gandhi National Fellowship (RGNF) and authorities of DA-IICT Gandhinagar for their kind support and co-operation to carry out this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madhu R. Kamble.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kamble, M.R., Patil, H.A. Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection. J Sign Process Syst 92, 777–791 (2020). https://doi.org/10.1007/s11265-020-01532-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-020-01532-3

Keywords

Navigation