Abstract
A forensic investigation uses personal traits to identify the persons involved in criminal offences. In this work on person authentication, the recorded voice samples can also be used to narrow down the search to identify persons. Time-frequency (T-F) features obtained from the concatenated training set of utterances are given to the convolutional neural networks (CNN), with layers configured for creating templates. Testing utterances are tied, and T-F features are derived. These features are applied to the CNN templates, and based on the match claimed, recognition accuracy is computed to validate the feature selection and CNN technique. Decision-level fusion of features with CNN for modelling and classification provides an overall authentication rate of 98%. This system is also implemented in real-time using Raspberry Pi hardware. This automated system would be helpful in identifying convicts in forensic sectors and perform secured online transactions against fraudulent attacks in financial sectors.
















Similar content being viewed by others
Data availability
All relevant data are within the paper and its supporting information files.
References
Abdel-Hamid O, Mohamed AR, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio, Speech, Lang Process 22(10):1533–1545
Albuquerque RQ, Mello CA (2021) Automatic no-reference speech quality assessment with convolutional neural networks. Neural Comput Appl 33:9993–10003
Bigun J, Fierrez-Aguilar J, Ortega-Garcia J, Gonzalez-Rodriguez J (2003) Multimodal biometric authentication using quality signals in mobile communications. In 12th International Conference on Image Analysis and Processing, 2003. Proceedings. (pp 2-11). IEEE
Das RK, Jelil S, Mahadeva Prasanna SR (2017) Development of multi-level speech based person authentication system. J Signal Process Syst 88:259–271
Dey S, Barman S, Bhukya RK, Das RK, Haris BC, Prasanna SM, Sinha R (2014) Speech biometric based attendance system. In 2014 twentieth national conference on communications (NCC) (pp 1-6). IEEE
Duc B, Bigün ES, Bigün J, Maître G, Fischer S (1997) Fusion of audio and video information for multi modal person authentication. Pattern Recog Lett 18(9):835–843
Gonzalez-Huitron V, León-Borges JA, Rodriguez-Mata AE, Amabilis-Sosa LE, Ramírez-Pereda B, Rodriguez H (2021) Disease detection in tomato leaves via CNN with lightweight architectures implemented in Raspberry Pi 4. Comput Electronics Agric 181:105951
Gunawan TS, Mokhtar MN, Kartiwi M, Ismail N, Effendi MR, & Qodim H (2020) Development of voice-based smart home security system using google voice kit. In 2020 6th International Conference on Wireless and Telematics (ICWT) (pp 1-4). IEEE
Hu F, Li Z, Yan L (2020) CNN and raspberry PI for fruit tree disease detection. In Intelligent Computing, Information and Control Systems: ICICCS 2019 (pp 1-8). Springer International Publishing
Johnston SJ, Cox SJ (2017) The raspberry Pi: A technology disrupter, and the enabler of dreams. Electronics 6(3):51
McCool C, Marcel S, Hadid A, Pietikäinen M, Matejka P, Cernocký J, ... Cootes T (2012) Bi-modal person recognition on a mobile phone: using mobile phone data. In 2012 IEEE international conference on multimedia and expo workshops (pp 635-640). IEEE
Pal M, Saha G (2015) On robustness of speech based biometric systems against voice conversion attack. Appl Soft Comput 30:214–228
Ramos-Lara R, López-García M, Cantó-Navarro E, Puente-Rodriguez L (2013) Real-time speaker verification system implemented on reconfigurable hardware. J Signal Process Syst 71:89–103
Rani R and Sachdeva R (2016) Genetic algorithm using speech and signature of biometrics. International Research J Eng Tech
Safavi S, Gan H, Mporas I, Sotudeh R (2016) Fraud detection in voice-based identity authentication applications and services. In 2016 IEEE 16th international conference on data mining workshops (ICDMW) (pp 1074-1081). IEEE
Sanderson C, Paliwal KK (2004) Identity verification using speech and face information. Digital Signal Process 14(5):449–480
Sarria-Paja M, Senoussaoui M, Falk TH (2015) The effects of whispered speech on state-of-the-art voice based biometrics systems. In 2015 IEEE 28th Canadian conference on electrical and computer engineering (CCECE) (pp 1254-1259). IEEE
Suri M, Parmar V, Singla A, Malviya R, Nair S (2015) Neuromorphic hardware accelerated adaptive authentication system. In 2015 IEEE Symposium Series on Computational Intelligence (pp 1206-1213). IEEE
Telmem M, Ghanou Y (2021) The convolutional neural networks for Amazigh speech recognition system. TELKOMNIKA (Telecommun Comput Electro Control) 19(2):515–522
Vashistha P, Singh JP, Jain P, Kumar J (2019) Raspberry Pi based voice-operated personal assistant (Neobot). In 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA) (pp 974-978). IEEE
Vázquez-Romero A, Gallardo-Antolín A (2020) Automatic detection of depression in speech using ensemble convolutional neural networks. Entropy 22(6):688
Yamanoor NS, Yamanoor S (2017) High quality, low cost education with the Raspberry Pi. In 2017 IEEE Global Humanitarian Technology Conference (GHTC) (pp 1-5). IEEE
Yang S, Gong Z, Ye K, Wei Y, Huang Z, Huang Z (2020) EdgeRNN: a compact speech recognition network with spatio-temporal features for edge computing. IEEE Access 8:81468-81478
Acknowledgements
Authors wish to express their sincere thanks to the SASTRA Deemed University, Thanjavur, India, for extending infrastructural support to carry out this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies being performed with human participants or animals
Conflict of interest
The authors have no relevant conflicts of interest to disclose.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
As the authors of the manuscript, we do not have a direct financial relation with the commercial Identity mentioned in our paper that might lead to a conflict of interest for any of the authors.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Revathi, A., Sasikaladevi, N. & Raju, N. Real time implementation of voice based robust person authentication using T-F features and CNN. Multimed Tools Appl 83, 31587–31601 (2024). https://doi.org/10.1007/s11042-023-16811-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16811-x