Abstract
Identity authentication based on Automatic Speaker Verification (ASV) has attracted extensive attention. Voice can be used as a substitute of password in many applications. However, the security of current ASV systems has been seriously challenged by many malicious spoofing attacks. Among all those attacks, replay attack is one of the biggest threats to the ASV System, where an adversary can use a pre-recorded speech sample of the legal user to access the ASV system. In this paper, we present a replay attack detection (RAD) scheme to distinguish normal speech and replayed speech. We focus on the distortion caused by loudspeaker: low-frequency attenuation and high-frequency harmonics, and present a suite of RAD features DL-RAD, including Harmonic Energy Ratio (HER), Low Spectral Ratio (LSR), Low Spectral Variance (LSV), and Low Spectral Difference Variance (LSDV), to describe the different characteristics between the normal speech signal and replay speech signal. SVM is adopted as a classifier to evaluate the performance of these features. Experiment results show that the True Positive Rate (TPR), True Negative Rate (TNR) of the proposed method are about 98.15% and 98.75% respectively, which are significantly better than the existing scheme. The proposed scheme can be applied to both text-dependent and text-independent ASV systems.
Similar content being viewed by others
References
Brown S (2006) Linear and nonlinear loudspeaker characterization. Ph.D. thesis, Citeseer
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines (ACM)
findblometrics (2015) Voicevault biometrics to protect payments. https://findbiometrics.com/voicevault-biometrics-to-protect-payments-25131/
Gaka J, Grzywacz M, Samborski R (2015) Playback attack detection for text-dependent speaker verification over telephone channels. Speech Comm 67:143
Koga S, Makihara S, Yamanouchi Y (2010) . In: IEEE international conference on acoustics speech and signal processing, pp 1678–1681
Kollewe J (2016) Hsbc rolls out voice and touch id security for bank customers–business. The Guardian
Lindberg J, Blomberg M (2012) Vulnerability in speaker verification - a study of technical impostor techniques
Ma Y, Luo X, Li X, Bao Z, Zhang Y (2018) Selection of rich model steganalysis features based on decision rough set α-positive region reduction. IEEE Trans Circ Chapman Hall/CRC Syst Video Technol PP(99):1
MPF (2015) DAILYMAIL.COM. Android can now unlock your phone when it hears your voice. http://www.dailymail.co.uk/sciencetech/article-3037733/OK-Google-Android-unlock-phone-hears-voice.html
Reynolds DA (2002) An overview of automatic speaker recognition technology 4, IV
(2015) Review: Jbl xtreme - how much bass can you handle? http://www.oluvsgadgets.net/2015/07/review-jbl-xtreme-how-much-bass-can-you-handle.html
Shen W, Khanna R (1997) Prolog to speaker recognition: a tutorial. Proc IEEE 85(9):1436
Shiota S, Villavicencio F, Yamagishi J, Ono N, Echizen I, Matsui T (2015) Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification
Villalba J, Lleida E (2010) . In: Fala, pp 131–134
Villalba J, Lleida E (2011) . In: Cost 2101 European conference on biometrics and Id management, pp 274–285
Villalba J, Lleida E (2011) Preventing replay attacks on speaker verification systems 47 (10), p 1
Wang ZF, Wei G, He QH, Wang ZF, Wei G (2011) Channel pattern noise based playback attack detection algorithm for speaker recognition 4, p 1708
Wang ZF (2011) Playback attack detection based on channel pattern noise. Huanan Ligong Daxue Xuebao/journal of South China University of Technology 39(10):7
Wang J, Li T, Shi YQ, Lian S, Ye J (2016) Forensics feature analysis in quaternion wavelet domain for distinguishing photographic images and computer graphics. Multimedia Tools Chapman Hall/CRC Appl 76(22):1
Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2014) Spoofing and countermeasures for speaker verification: a survey. Speech Comm 66:130
Wu Z, Gao S, Cling ES, Li H (2015) . In: Signal and information processing association summit and conference, pp 35–45
Wu Z, Li H (2016) On the study of replay and voice conversion attacks to text-dependent speaker verification. Multimedia Tools Appl 75(9):5311
Zhang L, Tan S, Yang J, Chen Y (2016) . In: ACM Sigsac conference on computer and communications security, pp 1080–1091
Zhang L, Cao J, Xu M, Zheng F (2008) Prevention of impostors entering speaker recognition systems, Journal of Tsinghua University
Zhang Y, Qin C, Zhang W, Liu F, Luo X (2018) On the fault-tolerant performance for a class of robust image steganography, Signal Processing
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by the Natural Science Foundation of China (NSFC) under the grant NO. U1536114, NO. 61872275, NO.U1536204, and China Scholarship Council.
Rights and permissions
About this article
Cite this article
Ren, Y., Fang, Z., Liu, D. et al. Replay attack detection based on distortion by loudspeaker for voice authentication. Multimed Tools Appl 78, 8383–8396 (2019). https://doi.org/10.1007/s11042-018-6834-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6834-3