Skip to main content
Log in

Real time implementation of voice based robust person authentication using T-F features and CNN

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A forensic investigation uses personal traits to identify the persons involved in criminal offences. In this work on person authentication, the recorded voice samples can also be used to narrow down the search to identify persons. Time-frequency (T-F) features obtained from the concatenated training set of utterances are given to the convolutional neural networks (CNN), with layers configured for creating templates. Testing utterances are tied, and T-F features are derived. These features are applied to the CNN templates, and based on the match claimed, recognition accuracy is computed to validate the feature selection and CNN technique. Decision-level fusion of features with CNN for modelling and classification provides an overall authentication rate of 98%. This system is also implemented in real-time using Raspberry Pi hardware. This automated system would be helpful in identifying convicts in forensic sectors and perform secured online transactions against fraudulent attacks in financial sectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

All relevant data are within the paper and its supporting information files.

References

  1. Abdel-Hamid O, Mohamed AR, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio, Speech, Lang Process 22(10):1533–1545

    Article  Google Scholar 

  2. Albuquerque RQ, Mello CA (2021) Automatic no-reference speech quality assessment with convolutional neural networks. Neural Comput Appl 33:9993–10003

    Article  Google Scholar 

  3. Bigun J, Fierrez-Aguilar J, Ortega-Garcia J, Gonzalez-Rodriguez J (2003) Multimodal biometric authentication using quality signals in mobile communications. In 12th International Conference on Image Analysis and Processing, 2003. Proceedings. (pp 2-11). IEEE

  4. Das RK, Jelil S, Mahadeva Prasanna SR (2017) Development of multi-level speech based person authentication system. J Signal Process Syst 88:259–271

    Article  Google Scholar 

  5. Dey S, Barman S, Bhukya RK, Das RK, Haris BC, Prasanna SM, Sinha R (2014) Speech biometric based attendance system. In 2014 twentieth national conference on communications (NCC) (pp 1-6). IEEE

  6. Duc B, Bigün ES, Bigün J, Maître G, Fischer S (1997) Fusion of audio and video information for multi modal person authentication. Pattern Recog Lett 18(9):835–843

    Article  ADS  Google Scholar 

  7. Gonzalez-Huitron V, León-Borges JA, Rodriguez-Mata AE, Amabilis-Sosa LE, Ramírez-Pereda B, Rodriguez H (2021) Disease detection in tomato leaves via CNN with lightweight architectures implemented in Raspberry Pi 4. Comput Electronics Agric 181:105951

    Article  Google Scholar 

  8. Gunawan TS, Mokhtar MN, Kartiwi M, Ismail N, Effendi MR, & Qodim H (2020) Development of voice-based smart home security system using google voice kit. In 2020 6th International Conference on Wireless and Telematics (ICWT) (pp 1-4). IEEE

  9. Hu F, Li Z, Yan L (2020) CNN and raspberry PI for fruit tree disease detection. In Intelligent Computing, Information and Control Systems: ICICCS 2019 (pp 1-8). Springer International Publishing

  10. Johnston SJ, Cox SJ (2017) The raspberry Pi: A technology disrupter, and the enabler of dreams. Electronics 6(3):51

    Article  Google Scholar 

  11. McCool C, Marcel S, Hadid A, Pietikäinen M, Matejka P, Cernocký J, ... Cootes T (2012) Bi-modal person recognition on a mobile phone: using mobile phone data. In 2012 IEEE international conference on multimedia and expo workshops (pp 635-640). IEEE

  12. Pal M, Saha G (2015) On robustness of speech based biometric systems against voice conversion attack. Appl Soft Comput 30:214–228

    Article  Google Scholar 

  13. Ramos-Lara R, López-García M, Cantó-Navarro E, Puente-Rodriguez L (2013) Real-time speaker verification system implemented on reconfigurable hardware. J Signal Process Syst 71:89–103

    Article  Google Scholar 

  14. Rani R and Sachdeva R (2016) Genetic algorithm using speech and signature of biometrics. International Research J Eng Tech

  15. Safavi S, Gan H, Mporas I, Sotudeh R (2016) Fraud detection in voice-based identity authentication applications and services. In 2016 IEEE 16th international conference on data mining workshops (ICDMW) (pp 1074-1081). IEEE

  16. Sanderson C, Paliwal KK (2004) Identity verification using speech and face information. Digital Signal Process 14(5):449–480

    Article  Google Scholar 

  17. Sarria-Paja M, Senoussaoui M, Falk TH (2015) The effects of whispered speech on state-of-the-art voice based biometrics systems. In 2015 IEEE 28th Canadian conference on electrical and computer engineering (CCECE) (pp 1254-1259). IEEE

  18. Suri M, Parmar V, Singla A, Malviya R, Nair S (2015) Neuromorphic hardware accelerated adaptive authentication system. In 2015 IEEE Symposium Series on Computational Intelligence (pp 1206-1213). IEEE

  19. Telmem M, Ghanou Y (2021) The convolutional neural networks for Amazigh speech recognition system. TELKOMNIKA (Telecommun Comput Electro Control) 19(2):515–522

    Article  Google Scholar 

  20. Vashistha P, Singh JP, Jain P, Kumar J (2019) Raspberry Pi based voice-operated personal assistant (Neobot). In 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA) (pp 974-978). IEEE

  21. Vázquez-Romero A, Gallardo-Antolín A (2020) Automatic detection of depression in speech using ensemble convolutional neural networks. Entropy 22(6):688

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  22. Yamanoor NS, Yamanoor S (2017) High quality, low cost education with the Raspberry Pi. In 2017 IEEE Global Humanitarian Technology Conference (GHTC) (pp 1-5). IEEE

  23. Yang S, Gong Z, Ye K, Wei Y, Huang Z, Huang Z (2020) EdgeRNN: a compact speech recognition network with spatio-temporal features for edge computing. IEEE Access 8:81468-81478

Download references

Acknowledgements

Authors wish to express their sincere thanks to the SASTRA Deemed University, Thanjavur, India, for extending infrastructural support to carry out this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Revathi.

Ethics declarations

Ethical approval

This article does not contain any studies being performed with human participants or animals

Conflict of interest

The authors have no relevant conflicts of interest to disclose.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

As the authors of the manuscript, we do not have a direct financial relation with the commercial Identity mentioned in our paper that might lead to a conflict of interest for any of the authors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Revathi, A., Sasikaladevi, N. & Raju, N. Real time implementation of voice based robust person authentication using T-F features and CNN. Multimed Tools Appl 83, 31587–31601 (2024). https://doi.org/10.1007/s11042-023-16811-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16811-x

Keywords