Combining evidences from Hilbert envelope and residual phase for detecting replay attacks

Singh, Madhusudan; Pati, Debadatta

doi:10.1007/s10772-019-09604-x

Combining evidences from Hilbert envelope and residual phase for detecting replay attacks

Published: 04 March 2019

Volume 22, pages 313–326, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Madhusudan Singh¹ &
Debadatta Pati¹

264 Accesses
9 Citations
Explore all metrics

Abstract

In this work, the Hilbert envelope of the linear prediction (LP) residual and the residual phase have been explored for detecting replay attacks. The two source features namely, LP residual Hilbert envelope mel frequency cepstral coefficient (LPRHEMFCC) and residual phase cepstral coefficient (RPCC) are used for replay detection. From the signal perspectives, Hilbert envelope represents the amplitude information of LP residual samples. Residual phase represents to excitation information present in the sequence of LP residual samples. Hence, both can be considered as two components of the raw LP residual signal. In this direction, score level fusion of LPRHEMFCC and RPCC features is compared with a third source feature named as, residual mel frequency cepstral coefficient (RMFCC) derived from the raw LP residual using LP analysis. Comparative analysis has been performed using Gaussian mixtures model-universal background model (GMM-UBM) ASV experiments (IITG-MV replay database) and spoof detection experiments (ASVspoof 2017 database). For IITG-MV database, relative (RFAR-ZFAR) improvements of 86.10% (males), 27.45% (females) and 54.14% (whole-set) are achieved for (LPRHEMFCC + RPCC) + MFCC combination over RMFCC + MFCC combination. The RFAR and ZFAR stands for false acceptance rate under replay attacks and zero effort impostor attacks, respectively. In terms of tandem-detection cost function (t-DCF) metrics, the obtained relative improvements are 40.50%, 13.13% and 26.16%, respectively. For ASVspoof 2017 database, relative EER improvements of 11.72% and 6.74% are achieved for (LPRHEMFCC + RPCC) + MFCC and (LPRHEMFCC + RPCC) + CQCC over RMFCC + MFCC and RMFCC + CQCC, respectively. These observations justify the usefulness of exploring Hilbert envelope and residual phase components of the LP residual over direct processing of the LP residual signal for detecting replay attacks. Moreover, score level fusion of LPRHEMFCC, RPCC and CQCC provides 8.86% EER.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implicit processing of linear prediction residual for replay attack detection

Article 23 August 2024

Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection

Article 15 April 2020

Replay spoofing countermeasures using high spectro-temporal resolution features

Article 20 February 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bonastre, J. F., Matrouf, D., & Fredouille, C. (2007). Artificial impostor voice transformation effects on false acceptance rates. In: Proceedings of interspeech, pp 2053–2056
Campbell, J. P, Jr. (1997). Speaker recognition: A tutorial. Proceedings on IEEE, 85(9), 1437–1462.
Article Google Scholar
Das, R. K., & Prasanna, S. M. (2016). Exploring different attributes of source information for speaker verification with limited test data. The Journal of the Acoustical Society of America, 140(1), 184–190.
Article Google Scholar
De Leon, P. L., Apsingekar, V. R., Pucher, M., & Yamagishi, J. (2010a). Revisiting the security of speaker verification systems against imposture using synthetic speech. In: Proceedings of ICASSP, pp 1798–1801
De Leon, P. L., Pucher, M., & Yamagishi, J. (2010b). Evaluation of the vulnerability of speaker verification to synthetic speech. In: Proceeding of Odyssey: The Speaker and Language Recognition Workshop p 28
Evans, N., Kinnunen, T., & Yamagishi, J. (2013). Spoofing and countermeasures for automatic speaker verification. In: Proceedings of interspeech, pp 925–929
Font, R., Espın, J. M., & Cano, M. J. (2017). Experimental analysis of features for replay attack detection—Results on the ASVspoof 2017 challenge. In: Proceedings of interspeech pp 7–11
Hanilçi, C. (2017). Linear prediction residual features for automatic speaker verification anti-spoofing. Multimedia Tools and Applications pp 1–13
Hanilçi, C., Kinnunen T, Tomi., Sahidullah, M., & Sizov, A. (2015). Classifiers for synthetic speech detection: A comparison. In: Proceeding of interspeech, pp 2057–2061
Haris, B. C., Pradhan, G., Prasanna, S. R. M., Das, R. K., & Sinha, R. (2012). Multivaribility speaker recognition database in Indian scenario. International Journal of Speech Technology (Springer), 15(4), 441–453.
Article Google Scholar
Hautamäki, R. G., Kinnunen, T., Hautamäki, V., Leino, T., & Laukkanen, A. M. (2013). I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proceeding of interspeech, pp 930–934
Hautamäki, R. G., Kinnunen, T., Hautamäki, V., & Laukkanen, A. M. (2015). Automatic versus human speaker verification: The case of voice mimicry. Speech Communication, 72, 13–31.
Article Google Scholar
Jelil, S., Das, R. K., Prasanna, S. M., & Sinha, R. (2017). Spoof detection using source, instantaneous frequency and cepstral features. In: Proceedings on interspeech pp 22–26
Jelil, S., Kalita, S., Prasanna, S. R. M., & Sinha, R. (2018). Exploration of compressed ILPR features for replay attack detection. In: Proceedings on interspeech, pp 631–635
Ji, Z., Li, Z. Y., Li, P., An, M., Gao, S., Wu, D., & Zhao, F. (2017). Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof2017. In: Proceedings of interspeech, pp 87–91
Kamble, M., Tak, H., & Patil, H. (2018). Effectiveness of speech demodulation-based features for replay detection. In: Proceeding of interspeech, pp 641–645
Kinnunen, T., Lee, K. A., Delgado, H., Evans, N., Todisco, M., Sahidullah, M., Yamagishi, J., & Reynolds, D. A. (2018). t-dcf: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. In: Proceeding of Odyssey the speaker and language recognition workshop, pp 312–319
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.
Article Google Scholar
Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., & Lee, K. A. (2017). The asvspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. In: Proceeding of interspeech, pp 2–6
Kinnunen, T., Wu, Z. Z., Lee, K. A., Sedlak, F., Chng, E. S., & Li, H. (2012). Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech. In: Proceeding of ICASSP, pp 4401–4404
Larcher, A., Lee, K. A., Ma, B., & Li, H. (2012). RSR2015: Database for text-dependent speaker verification using multiple pass-phrases. In: Proceeding of interspeech, pp 1580–1583
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., & Shchemelinin, V. (2017), Audio replay attack detection with deep learning frameworks. In: Proceeding of interspeech, pp 82–86
Li, D., Wang, L., Dang, J., Liu, M., Oo, Z., Nakagawa, S., Guan, H., & Li, X. (2018). Multiple phase information combination for replay attacks detection. In: Proceeding of interspeech, pp 656–660
Lindberg, J., & Blomberg, M. (1999). Vulnerability in speaker verification: A study of technical impostor techniques. In: Proceeding of EUROSPEECH, pp 5–9
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceeding of IEEE, 63(4), 561–580.
Article Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In: Proceeding on European conference on speech communication technology, Rhodes, Greece, 4, pp 1895–1898
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process Letter, 13(1), 52–55.
Article Google Scholar
Nagarsheth, P., Khoury, E., Patil, K., & Garland, M. (2017). Replay attack detection using DNN for channel discrimination. In: Proceeding of interspeech, pp 97–101
Nocerino, N., Soong, F., Rabiner, L., & Klatt, D. (1985). Comparative study of several distortion measures for speech recognition. Proceeding of ICASSP, 10, 25–28.
Google Scholar
Pépiot, E. (2014). Male and female speech: A study of mean F0, F0 range, phonation type and speech rate in parisian french and American English speakers. Speech Prosody, 7, 305–309.
Google Scholar
Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.
Article Google Scholar
Rabiner, L. R., & Schafer, R. W. (1978). Digital Processing of Speech Signals. Englewood Cliffs: Prentice-Hall.
Google Scholar
Raju Alluri, K., & Gangashetty, A. K. V. (2017). SFF anti-spoofer: IIIT-H submission for automatic speaker verification spoofing and countermeasures challenge 2017. In: Proceeding of interspeech, pp 107–111
Raykar, V. C., Yegnanarayana, B., Prasanna, S. M., & Duraiswami, R. (2005). Speaker localization using excitation source information in speech. IEEE Transactions on Speech and Audio Processing, 13(5), 751–761.
Article Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.
Article Google Scholar
Sailor, H., Kamble, M., & Patil, H. (2018). Auditory filterbank learning for temporal modulation features in replay spoof speech detection. In: Proceeding of interspeech, pp 666–670
Singh, M., & Pati, D. (2018). Linear prediction residual based short-term cepstral features for replay attacks detection. Proceeding of interspeech, 2018, 751–755.
Article Google Scholar
Suthokumar, G., Sethu, V., Wijenayake, C., & Ambikairajah, E. (2018). Modulation dynamic features for the detection of replay attacks. In: Proceeding of interspeech, pp 691–695
Tak, H., & Patil, H. (2018). Novel linear frequency residual cepstral features for replay attack detection. In: Proceeding of interspeech, pp 726–730
Tapkir, P., & Patil, H. (2018). Novel empirical mode decomposition cepstral features for replay spoof detection. In: Proceeding of interspeech, pp 721–725
The Bosaris toolkit [software package]. Retrieved from https://sites.google.com/site/bosaristoolkit
Todisco, M., Delgado, H., & Evans, N. (2017). Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Computer Speech and Language, 45, 516–535.
Article Google Scholar
Villalba, J., & Lleida, E. (2010). Speaker verification performance degradation against spoofing and tampering attacks. In: FALA 10 workshop, pp 131–134
Villaba, J., & Lieida, E. (2011). Preventing replay attacks on speaker verification systems. In: Proceeding of International carnahan conference on security technology (ICCST), pp 1–8
Wang, J., & Johnson, M. (2012). Residual phase cepstrum coefficients with application to cross-lingual speaker verification. In: Interspeech
Wang, Z., Wei, G., & He, Q. H. (2011). Channel pattern noise based playback attack detection algorithm for speaker recognition. In: Proceeding of IEEE Int conference of the biometrics special interest Group (BIOSIG) on machine learning and cybernetics, pp 1708–1713
Witkowski, M., Kacprzak, S., Zelasko, P., Kowalczyk, K., & Gałka, J. (2017). Audio replay attack detection using high-frequency features. In: Proceeding of interspeech, pp 27–31
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015a). Spoofing and counter measures for speaker verification: A survey. Speech Communication, 66, 130–153.
Article Google Scholar
Wu, Z., Kinnunen, T., Evans, N., Yamagishi, J., Hanilçi, C., Sahidullah, M., & Sizov, A. (2015b). ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Proceeding of interspeech, pp 2037–2041

Download references

Acknowledgements

This research work is funded by Ministry of Electronics and Information Technology (MeitY), Govt. of India through the project “Development of Excitation Source Features Based Spoof Resistant and Robust Audio-Visual Person Identification System”. The research work is carried out in Speech Processing and Pattern Recognition (SPARC) laboratory at National Institute of Technology Nagaland, Dimapur, India.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, National Institute of Technology Nagaland, Dimapur, 797103, India
Madhusudan Singh & Debadatta Pati

Authors

Madhusudan Singh
View author publications
You can also search for this author in PubMed Google Scholar
Debadatta Pati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Madhusudan Singh.

Additional information

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, M., Pati, D. Combining evidences from Hilbert envelope and residual phase for detecting replay attacks. Int J Speech Technol 22, 313–326 (2019). https://doi.org/10.1007/s10772-019-09604-x

Download citation

Received: 01 October 2018
Accepted: 26 February 2019
Published: 04 March 2019
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10772-019-09604-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining evidences from Hilbert envelope and residual phase for detecting replay attacks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Implicit processing of linear prediction residual for replay attack detection

Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection

Replay spoofing countermeasures using high spectro-temporal resolution features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Combining evidences from Hilbert envelope and residual phase for detecting replay attacks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Implicit processing of linear prediction residual for replay attack detection

Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection

Replay spoofing countermeasures using high spectro-temporal resolution features

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation