ABSTRACT
The spoofing clues with reverberation, channel and environmental noise are intertwined with the genuine speaker voice, making the task for replay attack detection challenging. In this study, we propose a novel approach to make full use of the replay clues of a whole utterance, by separately extracting different features from voiced and non-voiced segments and training separate Gaussian Mixed Models. First, a joint voice activity detector is adopted to get accurate boundaries of the different segments. Then this paper extracts Constant-Q Cepstral Coefficients and Inverse Mel Frequency Cepstral Coefficients from voiced and non-voiced segments respectively. Finally, a Score Calibrator Toolkit is used to fuse the scores of voiced and non-voiced segments. The result on evaluation set of ASVspoof 2017 V2.0 corpus shows that our proposed method yields an 18.4% relative reduction in equal error ratecompared to the CQCC-CMVN baseline system.
- Li, Lantian and Chen, Yixiang and Wang, Dong and Zheng, Thomas Fang. 2017. A study on replay attack and anti-spoofing for automatic speaker verification, arXiv preprint arXiv:1706.02101Google Scholar
- Yoon, S. H., Koh, M. S., Park, J. H., & Yu, H. J. 2020. A New Replay Attack Against Automatic Speaker Verification Systems. IEEE Access, 8, 36080-36088Google ScholarCross Ref
- Jung, J. W., Shim, H. J., Heo, H. S., & Yu, H. J. 2020. A study on the role of subsidiary information in replay attack spoofing detection. arXiv preprint arXiv:2001.11688Google Scholar
- Kinnunen, Tomi and Evans, Nicholas and Yamagishi, Junichi and Lee, Kong Aik and Sahidullah, Md and Todisco, Massimiliano and Delgado, H´ector. 2017. ASVspoof 2017: automatic speaker verification spoofing and countermeasures challenge evaluation plan, Training, vol.10, no.1508Google Scholar
- Font, Roberto, Juan M. Espín, and María José Cano. 2017. Experimental analysis of features for replay attack detection-results on the ASVspoof 2017 Challenge. Interspeech, pp.7-11Google ScholarCross Ref
- Patil, Hemant A and Kamble, Madhu R and Patel, Tanvina B and Soni, Meet H. 2017. Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection., in Interspeech, pp.12-16Google Scholar
- Kamble, Madhu R and Tak, Hemlata and Patil, Hemant A. 2018. Effectiveness of Speech Demodulation-Based Features for Replay Detection., in Interspeech, pp.641–645Google ScholarCross Ref
- Suthokumar, Gajan and Sethu, Vidhyasaharan and Wijenayake, Chamith and Ambikairajah, Eliathamby. 2018. Modulation Dynamic Features for the Detection of Replay Attacks., in Interspeech, pp.691-695Google ScholarCross Ref
- Sriskandaraja, Kaavya and Suthokumar, Gajan and Sethu, Vidhyasaharan and Ambikairajah, Eliathamby. 2017. Investigating the use of scattering coefficients for replay attack detection, in 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp.1195-1198Google ScholarCross Ref
- Paliwal, Kuldip K. 1998. Spectral subband centroid features for speech recognition, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), vol.2, pp.617-620Google ScholarCross Ref
- Suthokumar, Gajan and Sriskandaraja, Kaavya and Sethu, Vidhyasaharan andWijenayake, Chamith and Ambikairajah, Eliathamby. 2019. Phoneme specific modelling and scoring techniques for anti spoofing system, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6106-6110Google ScholarCross Ref
- Chettri, Bhusan and Mishra, Saumitra and Sturm, Bob L and Benetos, Emmanouil.2018. Analysing the predictions of a cnn-based replay spoofing detection system, in 2018 IEEE Spoken Language Technology Workshop (SLT), pp.92-97Google Scholar
- Saranya, MS and Padmanabhan, R and Murthy, Hema A. 2018. Replay attack detection in speaker verification using non-voiced segments and decision level feature switching, in 2018 International Conference on Signal Processing and Communications (SPCOM), pp.332-336Google ScholarCross Ref
- Beritelli, F and Casale, S and Ruggeri, G and Serrano, S. 2002. Performance evaluation and comparison of G. 729/AMR/fuzzy voice activity detectors, IEEE Signal Processing Letters, vol.9, no.3, pp.85-88Google ScholarCross Ref
- Tanel Alumäe, Asadullah. 2019. The TalTech Systems for the VOiCES from a Distance Challenge, in Interspeech (submitted)Google Scholar
- Delgado H, Todisco M, Sahidullah M, 2018. ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancementsGoogle Scholar
- Kinnunen, Tomi and Sahidullah, Md and Falcone, Mauro and Costantini, Luca and Hautam¨aki, Rosa Gonz´alez and Thomsen, Dennis and Sarkar, Achintya and Tan, Zheng-Hua and Delgado, H´ector and Todisco, Massimiliano and others. 2017. Reddots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5395-5399Google Scholar
- Hautamäki, V., Tuononen, M., Niemi-Laitinen, T., & Fränti, P. 2007. Improving speaker verification by periodicity based voice activity detection, Proc. 12th Int. Conf. Speech and Computer (SPECOM’2007), vol.2, pp.645-650Google Scholar
- Sjölander, Kåre. 2003. An HMM-based system for automatic segmentation and alignment of speech, Proceedings of Fonetik, vol.2003, pp.93-96Google Scholar
- Wu, Zhizheng and Kinnunen, Tomi and Evans, Nicholas and Yamagishi, Junichi and Hanilc¸i, Cemal and Sahidullah, Md and Sizov, Aleksandr. 2015. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge, in Sixteenth Annual Conference of the International Speech Communication AssociationGoogle ScholarCross Ref
- Brümmer N, De Villiers E. 2011. The BOSARIS toolkit user guide: Theory, algorithms and code for binary classifier score processing, in Documentation of BOSARIS toolkitGoogle Scholar
- Niko Brümmer. 2010. Measuring, Refining and Calibrating Speaker and Language Information Extracted from Speech, Ph.D. thesis, University of Stellenbosch, Stellenbosch, South Africa, DecGoogle Scholar
- Nandwana, Mahesh Kumar and Van Hout, Julien and McLaren, Mitchell and Richey, Colleen and Lawson, Aaron and Barrios, Maria Alejandra. 2019. The voices from a distance challenge 2019 evaluation plan, in arXiv preprint arXiv:1902.10828Google Scholar
Recommendations
Detection of Voice Conversion Spoofing Attacks Using Voiced Speech
Secure IT SystemsAbstractSpeech consists of voiced and unvoiced segments that differ in their production process and exhibit different characteristics. In this paper, we investigate the spectral differences between bonafide and spoofed speech for voiced and unvoiced ...
Automatic detection of breathy voiced vowels in Gujarati speech
This paper proposes a method for automatic detection of breathy voiced vowels in continuous Gujarati speech. As breathy voice is a specific phonetic feature predominantly present in Gujarati among Indian languages, it can be used for identifying ...
Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments
Improving the speech intelligibility remains a challenging problem in digital hearing aids. This research work proposes a new speech segregation algorithm to improve the speech intelligibility by effectively fusing the voiced and unvoiced segment of the ...
Comments