Abstract
This work presents the modules for authenticating the persons by using speech as a biometric against recorded playback attacks. It involves the implementation of feature extraction, modeling technique and testing procedure for authenticating the persons. Playback attacks are simulated by recording the original speech utterances using the speakers and mikes in a laptop using Audacity software. This work mainly involves the process for distinguishing original and recorded speeches and authenticating the speakers based on voice as a biometric. Features extracted from the original and recorded speeches are used to develop models for them. Voice passwords are assigned to the speakers and features are extracted from the training speech created by fusing the password specific original speech utterances. These features are applied to the training algorithm to generate password specific speaker models. Testing procedure involves the feature extraction and application of features to the models pertaining to recorded and original speech models. If the test speech belongs to the recorded speech, it is prevented from undergoing the further process. If it is an original speech, feature vectors of the test speech are applied to the password specific speaker models and based on the classification criteria, a speaker is identified and authenticated. Our system is found to be robust against playback attacks and has given better performance in authenticating sixteen speakers considered in our work. Passwords are isolated words and digits chosen from “TIMIT” speech database. This work is also extended to using AVSpoof database for authenticating 44 speakers against replay attacks and the performance is analyzed in terms of rejection rate.









Similar content being viewed by others
References
Bigun J, Fierrez-Aguilar J, Ortega-Garcia J, Gonzalez-Rodriguez J (2003) Multimodal Biometric Authentication using Quality Signals in Mobile Communications. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1234017&isnumber=27656
Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low-and high-dimensional approaches. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 43(4). 996-1002. https://ieeexplore.ieee.org/document/6425496/
Das RK, Jeli S, and Mahadeva Prasanna SR (2016) Development of Multi-Level Speech based Person Authentication System. 1-13. https://springer.com/article/10.1007/s11265-016-1148-z
Dey S, Barman S, Bhukya RK, Das RK, Haris BC, Prasanna SRM, Sinha R (2015) Speech Biometric Based Attendance System. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6811345&isnumber=6811235
Duc B, Bigiin ES, Bigiin J, Maitre G, Fischer S (1997) Fusion of audio and video information for multi modal person authentication. 835-843. https://doi.org/10.1016/S0167-8655(97)00071-8
Ergünay SK, Khoury E, Lazaridis A, Marcel S (2015) On the vulnerability of speaker verification to realistic voice spoofing. Int Proc. Int. Conf. on Biometrics: Theory, Applications and Systems (BTAS). https://ieeexplore.ieee.org/document/7358783/
Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE transactions on speech and audio processing, 2(4): 578-589. https://www.ee.columbia.edu/~dpwe/papers/HermM94-rasta.pdf
Hermansky H, Tsuga K, Makino S, Wakita H (1986) Perceptually based processing in automatic speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 11:1971–1974. https://doi.org/10.1109/ICASSP.1986.1168649
Hermansky H, Margon N, Bayya A, Kohn P (1991) The challenge of Inverse E: The RASTA PLP method. Proceedings of Twenty Fifth IEEE Asilomar Conference on Signals, Systems and Computers 2:800–804. https://doi.org/10.1109/ACSSC.1991.186557
Leng L, Teoh ABJ (2015). Alignment-free row-co-occurrence cancelable palmprint Fuzzy Vault. International Journal on Pattern Recognition 48. 2290–2303. https://www.sciencedirect.com/science/article/pii/S0031320315000400
Leng L, Teoh ABJ 2017. Simplified 2D PalmHash code for Secure Palmprint Verification. International Journal of Multimedia Tools and Applications. 76(6). 8373-8398. https://link.springer.com/article/10.1007/s11042-016-3458-3
Leng L, Teoh ABJ, Li M, Khan MK (2014) Analysis of correlation of 2DPalmHash Code and orientation range suitable for transposition. international journal on Neurocomputing. 131. 377-387. https://www.sciencedirect.com/science/article/pii/S0925231213009351
Leng L, Teoh ABJ, Li M, Khan MK (2014) A remote cancelable palmprint authentication protocol based on multi-directional two-dimensional PalmPhasor fusion. Security And Communication Networks. 7. 1860–1871. https://onlinelibrary.wiley.com/doi/abs/10.1002/sec.900
Leng L, Teoh ABJ, Li M, Khan MK (2015) Orientation range of transposition for vertical correlation suppression of 2DPalmPhasor Code. International Journalon Multimedia Tools and Applications 74(24):11683–11701 https://link.springer.com/article/10.1007/s11042-014-2255-0
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2Activity: Recognizing Complex Activities from Sensor Data. Proceedings of the 24th International Conference on Artificial Intelligence. 1617-1623. https://arxiv.org/abs/1611.01872
Liu Y, Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: Predicting your career path. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), 201-207. https://dl.acm.org/citation.cfm?id=3015842
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: Sensor-based activity recognition. Iinternational journal on Neurocomputing. 181: 108–115. https://www.sciencedirect.com/science/article/pii/S0925231215016331
Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban Water Quality Prediction based on Multi-task Multi-view Learning. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). 2576-2582. https://dl.acm.org/citation.cfm?id=3060981
McCool C, Marcel S, Hadid A, Pietikäinen M, Matĕjka P, Cernocký J, Poh N, Kittler J, Larcher A, Lévy C, Matrouf D, Bonastre J-F, Tresadern P, Cootes T (2012) Bi-Modal Person Recognition on a Mobile Phone: using mobile phone data. 635-638. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6266494&isnumber=6266221
Pal M, Saha G (2015) On robustness of speech based biometric systems against voice conversion attack. 30. 214-228. www.sciencedirect.com/science/article/pii/S1568494615000551
Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall, New Jersey
Rani R, Sachdeva R (2016) Genetic Algorithm using Speech and Signature of Biometrics.03(12).240-245. https://www.irjet.net/archives/V3/i12/IRJET-V3I1299.pdf
Revathi A, Venkataramani Y (2011) Speaker Independent Continuous Speech and Isolated Digit Recognition using VQ and HMM. Proceedings of IEEE sponsored International conference on Communication and Signal processing:198–202. https://doi.org/10.1109/ICCSP.2011.5739300
Safavi S, Gan H, Mporas I, Sotudeh R (2016) Fraud Detection in Voice-based Identity Authentication Applications and Services.1074-1081. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7836786&isnumber=7836631
Sanderson C, Paliwal KK (2004) Identity verification using speech and face information. 449-480. https://doi.org/10.1016/j.dsp.2004.05.001
Sarria-Paja M, Senoussaoui M, Falk TH (2015) The effects of whispered speech on state-of-the-art voice based biometrics systems. 1254-1259. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7129458&isnumber=7129089
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interest
The authors have declared that no competing interest exists.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Revathi, A., Jeyalakshmi, C. & Thenmozhi, K. Person authentication using speech as a biometric against play back attacks. Multimed Tools Appl 78, 1569–1582 (2019). https://doi.org/10.1007/s11042-018-6258-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6258-0