Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection

Kamble, Madhu R.; Patil, Hemant A.

doi:10.1007/s11265-020-01532-3

Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection

SI: ISCSLP 2018 -- invitation only
Published: 15 April 2020

Volume 92, pages 777–791, (2020)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Madhu R. Kamble¹ &
Hemant A. Patil¹

341 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we propose the combination of Amplitude Modulation and Frequency Modulation (AM-FM) features for replay Spoof Speech Detection (SSD) task. The AM components are known to be affected by noise (in this case, due to replay mechanism). In particular, we exploit this damage in AM component to corresponding Instantaneous Frequency (IF) for SSD task. Thus, the novelty of proposed Amplitude Weighted Frequency Cepstral Coefficients (AWFCC) feature set lies in using frequency components along with squared weighted amplitude components that are degraded due to replay noise. The AWFCC feature set contains the information of both AM and FM components together and hence, gave discriminatory information in the spectral characteristics. The experiments were performed on publicly available ASVspoof 2017 challenge version 1.0 and 2.0 databases using AWFCC feature set. We have compared results of proposed feature set with the other state-of-the-art feature set, such as Constant Q Cepstral Coefficients (CQCC), Linear Frequency Cepstral Coefficients (LFCC), Mel Frequency Cepstral Coefficients (MFCC) and using a simple Gaussian Mixture Model (GMM) classifier. The individual performance of AWFCC feature set obtained lower % EER than the other feature sets on both version 1.0 and 2.0 databases. Furthermore, we used score-level fusion in order to obtain the possible complementary information of two feature sets to reduce the % EER further. To that effect, the score-level fusion of CQCC and AWFCC feature sets gave 5.75 % and 10.42 % EER on development and evaluation sets, respectively, of ASVspoof 2017 version 2.0 database. Moreover, for evaluation dataset, we have also studied the performance of proposed feature set on different Replay Configurations (RC), namely, acoustic environments, playback, and recording devices. For all the levels of threat conditions (i.e., low, medium, and high) to the ASV system, the proposed feature set performed better compared to the existing state-of-the-art feature sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Replay spoofing countermeasures using high spectro-temporal resolution features

Article 20 February 2019

Audio-Replay Attack Detection Countermeasures

Replay attack detection with auditory filter-based relative phase features

Article Open access 10 June 2019

References

Zen, H., Tokuda, K., & Black, A.W. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11), 1039–1064.
Article Google Scholar
Stylianou, Y., & transformation, V. (2009). A survey. In IEEE international conference on acoustics, speech and signal processing, (ICASSP), Taipei, Taiwan, China (pp. 3585–3588).
Alegre, F.R., Janicki, A., & Evans, N. (2014). Re-assessing the threat of replay spoofing attacks against automatic speaker verification. In IEEE international conference of the biometrics special interest group (BIOSIG), Darmstadt, Germany (pp. 1–6).
Kinnunen, T., Sahidullah, M.D., Falcone, M., Costantini, L., Hautamaki, R.G., Thomsen, D.A.L., Sarkar, A.K., Tan, Z.H., Delgado, H., Todisco, M., & et al. (2017). Reddots replayed: a new replay spoofing attack corpus for text-dependent speaker verification research. In IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, Louisiana, USA (pp. 5395–5399).
Nagarsheth, P., Khoury, E., Patil, K., & Garland, M. (2017). Replay attack detection using DNN for channel discrimination. In INTERSPEECH, Stockholm, Sweden (pp. 97–101).
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015). Spoofing and countermeasures for speaker verification: a survey. Speech Communication, 66, 130–153.
Article Google Scholar
Madhu, R., Sailor, H.B., Patil, H.A., & Li, H. (2019). Advances in anti-spoofing: from the perspective of asvspoof challenges. In APSIPA transactions on signal and information processing (in press).
Paul, A., Das, R.K., Sinha, R., & Prasanna, S.R.M. (2016). Countermeasure to handle replay attacks in practical speaker verification systems. In IEEE international conference on signal processing and communications (SPCOM), Bengaluru, India (pp. 1–5).
Korshunov, P., Marcel, S., Muckenhirn, H., Gonçalves, A.R. , Mello, A.G.S., Violato, R.P.V., Simões, F.O., Neto, M.U., de Assis Angeloni, M., Stuchi, J.A., & et al. (2016). Overview of BTAS 2016 speaker anti-spoofing competition. In IEEE international conference on biometrics theory, applications and systems (BTAS), Niagara Falls, New York, USA (pp. 1–6).
Wu, Z., Gao, S., Cling, E.S., & Li, H. (2014). A study on replay attack and anti-spoofing for text-dependent speaker verification. In IEEE Asia-Pacific signal and information processing association, annual summit and conference (APSIPA), Chiang Mai, Thailand (pp. 1–5).
Kinnunen, T., Sahidullah, M.D., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., & Lee, K.A. (2017). The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection. In INTERSPEECH, Stockholm, Sweden (pp. 1–6).
Lee, K.A., Larcher, A., Wang, G., Kenny, P., Brümmer, N., van Leeuwen, D.A., Aronowitz, H., Kockmann, M., Vaquero, C., Ma, B., & et al. (2015). The RedDots data collection for speaker recognition. In INTERSPEECH, Dresden, Germany (pp. 2996–3000).
Font, R., Espín, J.M., & Cano, M.J. (2017). Experimental analysis of features for replay attack detection results on the ASVspoof 2017 challenge. In INTERSPEECH, Stockholm, Sweden (pp. 7–11).
Patil, H.A., Kamble, M.R., Patel, T.B., & Soni, M. (2017). Novel variable length Teager energy separation based instantaneous frequency features for replay detection. In INTERSPEECH, Stockholm, Sweden (pp. 12–16).
Jelil, S., Das, R.K., Prasanna, S.R.M., & Sinha, R. (2017). Spoof detection using source, instantaneous frequency and cepstral features. In INTERSPEECH, Stockholm, Sweden (pp. 22–26).
Alluri, K.N.R.K.R., Achanta, S., Kadiri, S.R., Gangashetty, S.V., & Vuppala, A.K. (2017). SFF anti-spoofer: IIIT-H submission for automatic speaker verification spoofing and countermeasures challenge 2017. In INTERSPEECH, Stockholm, Sweden (pp. 107–111).
Witkowski, M., Kacprzak, S., Zelasko, P., Kowalczyk, K., & Gałka, J. (2017). Audio replay attack detection using high-frequency features. In INTERSPEECH, Stockholm, Sweden (pp. 27–31).
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., & Shchemelinin, V. (2017). Audio replay attack detection with deep learning frameworks. In INTERSPEECH, Stockholm, Sweden (pp. 82–86).
Cai, W., Cai, D., Liu, W., Li, G., & Li, M. (2017). Countermeasures for automatic speaker verification replay spoofing attack: on data augmentation, feature representation, classification and fusion. In INTERSPEECH, Stockholm, Sweden (pp. 17–21).
Chen, Z., Xie, Z., Zhang, W., & Xu, X. (2017). ResNet and model fusion for automatic spoofing detection. In INTERSPEECH 2017, Stockholm, Sweden (pp. 102–106).
Kamble, M.R., & Patil, H.A. (2017). Novel energy separation based instantaneous frequency features for spoof speech detection. In IEEE European signal processing conference (EUSIPCO), Kos Island, Greece (pp. 106–110).
Kamble, M.R., & Patil, H.A. (2017). Effectiveness of Mel scale-based ESA-IFCC features for classification of natural vs. spoofed speech. In Shankar, B.U., & et al. (Eds.) PReMI, Lecture Notes in Computer Sciance (LNCS) (pp. 308–316): Springer.
Kamble, M.R., Tak, H., & Patil, H.A. (2018). Effectiveness of speech demodulation-based features for replay detection. In INTERSPEECH, Hyderabad, India (pp. 641–645).
Kamble, M.R., & Patil, H.A. (2018). Novel variable length energy separation algorithm using instantaneous amplitude features for replay detection. In INTERSPEECH, Hyderabad, India (pp. 646–650).
Kamble, M.R., & Patil, H.A. (2018). Novel amplitude weighted frequency modulation features for replay spoof detection. ISCSLP, Taipei, Taiwan, pp. 185–189.
Kamble, M.R., Tak, H., Maddala, S.K., & Patil, H.A. (2018). Novel demodulation-based features using classifier-level fusion of GMM and CNN for replay detection. In ISCSLP. Taipei, Taiwan (pp. 334–338).
Kamble, M.R., & Patil, H.A. (2019). Analysis of reverberation via Teager energy features for replay spoof speech detection. In IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton, UK (pp. 2607–2611).
Dimitriadis, D., & Bocchieri, E. (2015). Use of micro-modulation features in large vocabulary continuous speech recognition tasks. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(8), 1348–1357.
Article Google Scholar
Maragos, P., Quatieri, T.F., & Kaiser, J.F. (1991). Speech nonlinearities, modulations, and energy operators. In IEEE international conference on acoustics, speech, and signal processing, (ICASSP), Toronto, Ontario, Canada (pp. 421–424).
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.
Article Google Scholar
Tak, H., & Patil, H.A. (2018). Novel linear frequency residual cepstral features for replay attack detection. In INTERSPEECH, Hyderabad, India (pp. 726–730).
Mallat, S. (1999). A wavelet tour of signal processing, 2nd edn. New York: Academic Press.
MATH Google Scholar
Maragos, P., Kaiser, J.F., & Quatieri, T.F. (1993). On amplitude and frequency demodulation using energy operators. IEEE Transactions on Signal Processing, 41(4), 1532–1550.
Article Google Scholar
Maragos, P., Quatieri, T.F., & Kaiser, J.F. (1992). On separating amplitude from frequency modulations using energy operators. In International conference on acoustics, speech, and signal processing (ICASSP), San Francisco, California, USA, (Vol. 2 pp. 1–4).
Quatieri, T.F. (2006). Discrete-time speech signal processing: principles and practice, 1st edn. India: Pearson Education.
Google Scholar
Luo, H., Wang, Y., Poeppel, D., & Simon, J.Z. (2006). Concurrent encoding of frequency and amplitude modulation in human auditory cortex. Meg evidence, Journal of Neurophysiology, 96(5), 2712–2723.
Article Google Scholar
Potamianos, A., & Maragos, P. (1994). A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation. Signal Processing, 37(1), 95–120.
Article Google Scholar
Cohen, L., Assaleh, K., & Fineberg, A.D.A.M. (1992). Instantaneous bandwidth and formant bandwidth. In IEEE SP workshop on statistical signal and array processing (pp. 13–17).
Potamianos, A., & Maragos, P. (1996). Speech formant frequency and bandwidth tracking using multiband energy demodulation. The Journal of the Acoustical Society of America (JASA), 99(6), 3795–3806.
Article Google Scholar
Li, D., & O’Shaughnessy, D. (2003). Speech processing – a dynamic and optimization-oriented approach, 1st edn. New York: Marcel Dekker Inc.
Google Scholar
Kaiser, J.F. (1990). On a simple algorithm to calculate the energy of a signal. In International conference on acoustics, speech, and signal processing (ICASSP), Albuquerque, New Mexico, USA (pp. 381–384).
Ramamohan Rao, K., & Yip, P. (2014). Discrete cosine transform: algorithms, advantages, Applications. New York: Academic Press.
MATH Google Scholar
Kamble, MR, Maddala, S.K., Tak, H., & Patil, H.A. (2019). Comparison of frame and utterance-level classifers for replay attack detection, accepted in Asia-Pacific signal and information processing association, annual summit and conference (APSIPA-ASC), Lanzhou, China.
Delgado, H., Todisco, M., Md, S., Evans, N., Kinnunen, T., Lee, K.A., & Yamagishi, J. (2018). ASVspoof 2017 version 2.0: Meta-data analysis and baseline enhancements. In Odyssey the speaker and language recognition workshop, Les Sables d’Olonne, France (pp. 296–303).
Objective Control for Talker Verification (OCTAVE), https://www.octave-project.eu/, Last Accessed 19 Jan 2019.
Todisco, M., Delgado, H., & Evans, N. (2017). Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Computer Speech & Language, Elsevier, 45, 516–535.
Article Google Scholar
Rodomagoulakis, I., & Maragos, P. (2019). Improved frequency modulation features for multichannel distant speech recognition. IEEE Journal of Selected Topics in Signal Processing, 13(4), 841–849.
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank the organizers of the special issue of Springer Journal of Signal Processing Systems for ISCSLP 2018 and also thank organizers of ASVspoof 2017 Challenge campaign. In addition, they also thank University Grants Commission (UGC) for providing Rajiv Gandhi National Fellowship (RGNF) and authorities of DA-IICT Gandhinagar for their kind support and co-operation to carry out this research work.

Author information

Authors and Affiliations

Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar-382007, Gujarat, India
Madhu R. Kamble & Hemant A. Patil

Authors

Madhu R. Kamble
View author publications
You can also search for this author in PubMed Google Scholar
Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Madhu R. Kamble.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kamble, M.R., Patil, H.A. Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection. J Sign Process Syst 92, 777–791 (2020). https://doi.org/10.1007/s11265-020-01532-3

Download citation

Received: 19 February 2019
Revised: 13 March 2020
Accepted: 17 March 2020
Published: 15 April 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11265-020-01532-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection

Abstract

Access this article

Similar content being viewed by others

Replay spoofing countermeasures using high spectro-temporal resolution features

Audio-Replay Attack Detection Countermeasures

Replay attack detection with auditory filter-based relative phase features

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection

Abstract

Access this article

Similar content being viewed by others

Replay spoofing countermeasures using high spectro-temporal resolution features

Audio-Replay Attack Detection Countermeasures

Replay attack detection with auditory filter-based relative phase features

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation