Abstract
In this work, the Hilbert envelope of the linear prediction (LP) residual and the residual phase have been explored for detecting replay attacks. The two source features namely, LP residual Hilbert envelope mel frequency cepstral coefficient (LPRHEMFCC) and residual phase cepstral coefficient (RPCC) are used for replay detection. From the signal perspectives, Hilbert envelope represents the amplitude information of LP residual samples. Residual phase represents to excitation information present in the sequence of LP residual samples. Hence, both can be considered as two components of the raw LP residual signal. In this direction, score level fusion of LPRHEMFCC and RPCC features is compared with a third source feature named as, residual mel frequency cepstral coefficient (RMFCC) derived from the raw LP residual using LP analysis. Comparative analysis has been performed using Gaussian mixtures model-universal background model (GMM-UBM) ASV experiments (IITG-MV replay database) and spoof detection experiments (ASVspoof 2017 database). For IITG-MV database, relative (RFAR-ZFAR) improvements of 86.10% (males), 27.45% (females) and 54.14% (whole-set) are achieved for (LPRHEMFCC + RPCC) + MFCC combination over RMFCC + MFCC combination. The RFAR and ZFAR stands for false acceptance rate under replay attacks and zero effort impostor attacks, respectively. In terms of tandem-detection cost function (t-DCF) metrics, the obtained relative improvements are 40.50%, 13.13% and 26.16%, respectively. For ASVspoof 2017 database, relative EER improvements of 11.72% and 6.74% are achieved for (LPRHEMFCC + RPCC) + MFCC and (LPRHEMFCC + RPCC) + CQCC over RMFCC + MFCC and RMFCC + CQCC, respectively. These observations justify the usefulness of exploring Hilbert envelope and residual phase components of the LP residual over direct processing of the LP residual signal for detecting replay attacks. Moreover, score level fusion of LPRHEMFCC, RPCC and CQCC provides 8.86% EER.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bonastre, J. F., Matrouf, D., & Fredouille, C. (2007). Artificial impostor voice transformation effects on false acceptance rates. In: Proceedings of interspeech, pp 2053–2056
Campbell, J. P, Jr. (1997). Speaker recognition: A tutorial. Proceedings on IEEE, 85(9), 1437–1462.
Das, R. K., & Prasanna, S. M. (2016). Exploring different attributes of source information for speaker verification with limited test data. The Journal of the Acoustical Society of America, 140(1), 184–190.
De Leon, P. L., Apsingekar, V. R., Pucher, M., & Yamagishi, J. (2010a). Revisiting the security of speaker verification systems against imposture using synthetic speech. In: Proceedings of ICASSP, pp 1798–1801
De Leon, P. L., Pucher, M., & Yamagishi, J. (2010b). Evaluation of the vulnerability of speaker verification to synthetic speech. In: Proceeding of Odyssey: The Speaker and Language Recognition Workshop p 28
Evans, N., Kinnunen, T., & Yamagishi, J. (2013). Spoofing and countermeasures for automatic speaker verification. In: Proceedings of interspeech, pp 925–929
Font, R., Espın, J. M., & Cano, M. J. (2017). Experimental analysis of features for replay attack detection—Results on the ASVspoof 2017 challenge. In: Proceedings of interspeech pp 7–11
Hanilçi, C. (2017). Linear prediction residual features for automatic speaker verification anti-spoofing. Multimedia Tools and Applications pp 1–13
Hanilçi, C., Kinnunen T, Tomi., Sahidullah, M., & Sizov, A. (2015). Classifiers for synthetic speech detection: A comparison. In: Proceeding of interspeech, pp 2057–2061
Haris, B. C., Pradhan, G., Prasanna, S. R. M., Das, R. K., & Sinha, R. (2012). Multivaribility speaker recognition database in Indian scenario. International Journal of Speech Technology (Springer), 15(4), 441–453.
Hautamäki, R. G., Kinnunen, T., Hautamäki, V., Leino, T., & Laukkanen, A. M. (2013). I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proceeding of interspeech, pp 930–934
Hautamäki, R. G., Kinnunen, T., Hautamäki, V., & Laukkanen, A. M. (2015). Automatic versus human speaker verification: The case of voice mimicry. Speech Communication, 72, 13–31.
Jelil, S., Das, R. K., Prasanna, S. M., & Sinha, R. (2017). Spoof detection using source, instantaneous frequency and cepstral features. In: Proceedings on interspeech pp 22–26
Jelil, S., Kalita, S., Prasanna, S. R. M., & Sinha, R. (2018). Exploration of compressed ILPR features for replay attack detection. In: Proceedings on interspeech, pp 631–635
Ji, Z., Li, Z. Y., Li, P., An, M., Gao, S., Wu, D., & Zhao, F. (2017). Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof2017. In: Proceedings of interspeech, pp 87–91
Kamble, M., Tak, H., & Patil, H. (2018). Effectiveness of speech demodulation-based features for replay detection. In: Proceeding of interspeech, pp 641–645
Kinnunen, T., Lee, K. A., Delgado, H., Evans, N., Todisco, M., Sahidullah, M., Yamagishi, J., & Reynolds, D. A. (2018). t-dcf: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. In: Proceeding of Odyssey the speaker and language recognition workshop, pp 312–319
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.
Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., & Lee, K. A. (2017). The asvspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. In: Proceeding of interspeech, pp 2–6
Kinnunen, T., Wu, Z. Z., Lee, K. A., Sedlak, F., Chng, E. S., & Li, H. (2012). Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech. In: Proceeding of ICASSP, pp 4401–4404
Larcher, A., Lee, K. A., Ma, B., & Li, H. (2012). RSR2015: Database for text-dependent speaker verification using multiple pass-phrases. In: Proceeding of interspeech, pp 1580–1583
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., & Shchemelinin, V. (2017), Audio replay attack detection with deep learning frameworks. In: Proceeding of interspeech, pp 82–86
Li, D., Wang, L., Dang, J., Liu, M., Oo, Z., Nakagawa, S., Guan, H., & Li, X. (2018). Multiple phase information combination for replay attacks detection. In: Proceeding of interspeech, pp 656–660
Lindberg, J., & Blomberg, M. (1999). Vulnerability in speaker verification: A study of technical impostor techniques. In: Proceeding of EUROSPEECH, pp 5–9
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceeding of IEEE, 63(4), 561–580.
Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In: Proceeding on European conference on speech communication technology, Rhodes, Greece, 4, pp 1895–1898
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process Letter, 13(1), 52–55.
Nagarsheth, P., Khoury, E., Patil, K., & Garland, M. (2017). Replay attack detection using DNN for channel discrimination. In: Proceeding of interspeech, pp 97–101
Nocerino, N., Soong, F., Rabiner, L., & Klatt, D. (1985). Comparative study of several distortion measures for speech recognition. Proceeding of ICASSP, 10, 25–28.
Pépiot, E. (2014). Male and female speech: A study of mean F0, F0 range, phonation type and speech rate in parisian french and American English speakers. Speech Prosody, 7, 305–309.
Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.
Rabiner, L. R., & Schafer, R. W. (1978). Digital Processing of Speech Signals. Englewood Cliffs: Prentice-Hall.
Raju Alluri, K., & Gangashetty, A. K. V. (2017). SFF anti-spoofer: IIIT-H submission for automatic speaker verification spoofing and countermeasures challenge 2017. In: Proceeding of interspeech, pp 107–111
Raykar, V. C., Yegnanarayana, B., Prasanna, S. M., & Duraiswami, R. (2005). Speaker localization using excitation source information in speech. IEEE Transactions on Speech and Audio Processing, 13(5), 751–761.
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.
Sailor, H., Kamble, M., & Patil, H. (2018). Auditory filterbank learning for temporal modulation features in replay spoof speech detection. In: Proceeding of interspeech, pp 666–670
Singh, M., & Pati, D. (2018). Linear prediction residual based short-term cepstral features for replay attacks detection. Proceeding of interspeech, 2018, 751–755.
Suthokumar, G., Sethu, V., Wijenayake, C., & Ambikairajah, E. (2018). Modulation dynamic features for the detection of replay attacks. In: Proceeding of interspeech, pp 691–695
Tak, H., & Patil, H. (2018). Novel linear frequency residual cepstral features for replay attack detection. In: Proceeding of interspeech, pp 726–730
Tapkir, P., & Patil, H. (2018). Novel empirical mode decomposition cepstral features for replay spoof detection. In: Proceeding of interspeech, pp 721–725
The Bosaris toolkit [software package]. Retrieved from https://sites.google.com/site/bosaristoolkit
Todisco, M., Delgado, H., & Evans, N. (2017). Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Computer Speech and Language, 45, 516–535.
Villalba, J., & Lleida, E. (2010). Speaker verification performance degradation against spoofing and tampering attacks. In: FALA 10 workshop, pp 131–134
Villaba, J., & Lieida, E. (2011). Preventing replay attacks on speaker verification systems. In: Proceeding of International carnahan conference on security technology (ICCST), pp 1–8
Wang, J., & Johnson, M. (2012). Residual phase cepstrum coefficients with application to cross-lingual speaker verification. In: Interspeech
Wang, Z., Wei, G., & He, Q. H. (2011). Channel pattern noise based playback attack detection algorithm for speaker recognition. In: Proceeding of IEEE Int conference of the biometrics special interest Group (BIOSIG) on machine learning and cybernetics, pp 1708–1713
Witkowski, M., Kacprzak, S., Zelasko, P., Kowalczyk, K., & Gałka, J. (2017). Audio replay attack detection using high-frequency features. In: Proceeding of interspeech, pp 27–31
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015a). Spoofing and counter measures for speaker verification: A survey. Speech Communication, 66, 130–153.
Wu, Z., Kinnunen, T., Evans, N., Yamagishi, J., Hanilçi, C., Sahidullah, M., & Sizov, A. (2015b). ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Proceeding of interspeech, pp 2037–2041
Acknowledgements
This research work is funded by Ministry of Electronics and Information Technology (MeitY), Govt. of India through the project “Development of Excitation Source Features Based Spoof Resistant and Robust Audio-Visual Person Identification System”. The research work is carried out in Speech Processing and Pattern Recognition (SPARC) laboratory at National Institute of Technology Nagaland, Dimapur, India.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, M., Pati, D. Combining evidences from Hilbert envelope and residual phase for detecting replay attacks. Int J Speech Technol 22, 313–326 (2019). https://doi.org/10.1007/s10772-019-09604-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-019-09604-x