skip to main content
10.1145/3378904.3378908acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdetConference Proceedingsconference-collections
research-article

Voice Authentication Model for One-time Password Using Deep Learning Models

Published: 09 April 2020 Publication History

Abstract

This paper explores the possibility of implementing a voice authentication system consisting of speech recognition and speaker verication model for the one-time password (OTP) system. The speech recognition model is responsible for classifying user utterances of random OTP digits in Bahasa Indonesia and the speaker verification model is used to verify the identity of the speaker. The long short-term memory network and siamese network with convolutional neural networks are employed as the model, where they aim to recognize and verify human voices represented by MFCC feature vectors. From the experiments, it is found that the validation accuracy of the speech recognition model is reliable, yet the speaker verication model cannot achieve satisfactory result.

References

[1]
Aggarwal, N. 2015. Analysis of various features using different temporal derivatives from speech signals. Int. J. Comput. Appl. 118, 8 (May. 2015), 1--9.
[2]
Carter, J. V., Pan, J., Rai, S. N., and Galandiuk, S. 2016. ROC-ing along: evaluation and interpretation of receiver operating characteristic curves. Surgery 159, 6 (Jun. 2016), 1638--1645. DOI=https://doi.org/10.1016/j.surg.2015.12.029.
[3]
Chen, Z., Watanabe, S., Erdogan, H., and Hershey, J. R. 2015. Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. In Proceedings of the 16th Annual Conference of the International Speech Communication Association (Dresden, Germany, September 6-10, 2015). INTERSPEECH '15. 3274--3278.
[4]
Chopra, S., Hadsell, R., LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verication. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (San Diego, CA, United States, June 20-25, 2015). CVPR '05. IEEE, New York, NY, 539--546. DOI=https://doi.org/10.1109/CVPR.2005.202.
[5]
Chowdhury, A. and Ross, A. 2017. Extracting sub-glottal and supra-glottal features from MFCC using convolutional neural networks for speaker identication in degraded audio signals. In Proceedings of the 2017 IEEE International Joint Conference on Biometrics (Denver, CO, United States, October 1-4, 2017). IJCB '17. IEEE, New York, NY, 608--617. DOI=https://doi.org/10.1109/BTAS.2017.8272748.
[6]
Graves, A. and Jaitly, N. 2014. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the International Conference on Machine Learning (Beijing, China, June 21-26, 2014). ICML '14. JMLR, 1764--1772.
[7]
Heigold, G., Moreno, I., Bengio, S. and Shazeer, N. 2016. End-to-end text-dependent speaker verication. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (Shanghai, China, March 20-25, 2016). ICASSP '16. IEEE, New York, NY, 5115--5119. DOI=https://doi.org/10.1109/ICASSP.2016.7472652.
[8]
Hochreiter, S. and Schmidhuber, J. 1997. Long short-term memory. Neural Computation 9, 8 (Nov. 1997), 1735--1780. DOI=https://doi.org/10.1162/neco.1997.9.8.1735.
[9]
Li, J., Mohamed, A., Zweig, G., and Gong, Y. 2015. LSTM time and frequency recurrence for automatic speech recognition. In Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (Scottsdale, AZ, United States, December 13-17, 2015). ASRU '15. IEEE, New York, NY, 187--191. DOI=https://doi.org/10.1109/ASRU.2015.7404793.
[10]
Li, X. and Wu, X. 2015. Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (Brisbane, Australia, April 19-24, 2015). ICASSP '15. IEEE, New York, NY, 4520--4524. DOI=https://doi.org/10.1109/ICASSP.2015.7178826.
[11]
McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., and Nieto, O. 2015. Librosa: Audio and music signal analysis in Python. In Proceedings of the 14th Python in Science Conference (Austin, Texas, July 6-12, 2015). SciPy '15. 18--24.
[12]
Pahwa, A. and Aggarwal, G. 2016. Speech feature extraction for gender recognition. Int. J. Image, Graph. and Signal Processing 8, 9 (Sept. 2016), 17--25. DOI=https://doi.org/ 10.5815/ijigsp.2016.09.03.
[13]
Qian, Y., Bi, M., Tan, T., and Yu, K. 2016. Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio, Speech, and Lang. Process. 24, 12 (Dec. 2016), 2263--2276. DOI=https://doi.org/10.1109/TASLP.2016.2602884.
[14]
Ravuri, S. and Stolcke, A. 2015. Recurrent neural network and LSTM models for lexical utterance classication. In Proceedings of the 16th Annual Conference of the International Speech Communication Association (Dresden, Germany, September 6-10, 2015). INTERSPEECH '15. 135--139.
[15]
Sainath, T. N., Vinyals, O., Senior, A., and Sak, H. 2015. Convolutional, long short-term memory, fully connected deep neural networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (Brisbane, Australia, April 19-24, 2015). ICASSP '15. IEEE, New York, NY, 4580--4584. DOI=https://doi.org/10.1109/ICASSP.2015.7178838.
[16]
Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. 2014. Deepface: Closing the gap to human-level performance in face verication. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Columbus, OH, United States, June 23-28, 2014). CVPR '14. IEEE, New York, NY, 1701--1708. DOI= http://dx.doi.org/10.1109/FCVPR.2014.220.
[17]
Zhang, Y., Yu, M., Li, N., Yu, C., Cui, J., and Yu, D. 2019. Seq2Seq attentional siamese neural networks for text-dependent speaker verication. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (Brighton, United Kingdom, May 12-17, 2019). ICASSP '19. IEEE, New York, NY, 6131--6135. DOI=https://doi.org/10.1109/ICASSP.2019.8682676.

Cited By

View all
  • (2024)Mobile random text-based voice authentication for older adults: A pilot studyJournal of Applied Research on Science and Technology (JARST)10.60101/jarst.2023.255839Online publication date: 13-Aug-2024
  • (2024)Deep learning based authentication schemes for smart devices in different modalities: progress, challenges, performance, datasets and future directionsMultimedia Tools and Applications10.1007/s11042-024-18350-583:28(71451-71493)Online publication date: 8-Feb-2024
  • (2023)An Efficient Voice Authentication System using Enhanced Inceptionv3 AlgorithmJournal of Machine and Computing10.53759/7669/jmc202303032(379-393)Online publication date: 5-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
BDET '20: Proceedings of the 2020 2nd International Conference on Big Data Engineering and Technology
January 2020
126 pages
ISBN:9781450376839
DOI:10.1145/3378904
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • Natl University of Singapore: National University of Singapore
  • Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Speech recognition
  2. deep learning
  3. one-time password
  4. speaker verication

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

BDET 2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Mobile random text-based voice authentication for older adults: A pilot studyJournal of Applied Research on Science and Technology (JARST)10.60101/jarst.2023.255839Online publication date: 13-Aug-2024
  • (2024)Deep learning based authentication schemes for smart devices in different modalities: progress, challenges, performance, datasets and future directionsMultimedia Tools and Applications10.1007/s11042-024-18350-583:28(71451-71493)Online publication date: 8-Feb-2024
  • (2023)An Efficient Voice Authentication System using Enhanced Inceptionv3 AlgorithmJournal of Machine and Computing10.53759/7669/jmc202303032(379-393)Online publication date: 5-Oct-2023
  • (2021)“I...Got my Nose-Print. But it Wasn’t Accurate”: How People with Upper Extremity Impairment Authenticate on their Personal Computing DevicesProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445070(1-14)Online publication date: 6-May-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media