research-article

Voice Authentication Model for One-time Password Using Deep Learning Models

Authors:

Janson Hendryli,

Dyah Erny HerwindiatiAuthors Info & Claims

BDET '20: Proceedings of the 2020 2nd International Conference on Big Data Engineering and Technology

Pages 35 - 39

https://doi.org/10.1145/3378904.3378908

Published: 09 April 2020 Publication History

Abstract

This paper explores the possibility of implementing a voice authentication system consisting of speech recognition and speaker verication model for the one-time password (OTP) system. The speech recognition model is responsible for classifying user utterances of random OTP digits in Bahasa Indonesia and the speaker verification model is used to verify the identity of the speaker. The long short-term memory network and siamese network with convolutional neural networks are employed as the model, where they aim to recognize and verify human voices represented by MFCC feature vectors. From the experiments, it is found that the validation accuracy of the speech recognition model is reliable, yet the speaker verication model cannot achieve satisfactory result.

References

[1]

Aggarwal, N. 2015. Analysis of various features using different temporal derivatives from speech signals. Int. J. Comput. Appl. 118, 8 (May. 2015), 1--9.

[2]

Carter, J. V., Pan, J., Rai, S. N., and Galandiuk, S. 2016. ROC-ing along: evaluation and interpretation of receiver operating characteristic curves. Surgery 159, 6 (Jun. 2016), 1638--1645. DOI=https://doi.org/10.1016/j.surg.2015.12.029.

[3]

Chen, Z., Watanabe, S., Erdogan, H., and Hershey, J. R. 2015. Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. In Proceedings of the 16th Annual Conference of the International Speech Communication Association (Dresden, Germany, September 6-10, 2015). INTERSPEECH '15. 3274--3278.

[4]

Chopra, S., Hadsell, R., LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verication. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (San Diego, CA, United States, June 20-25, 2015). CVPR '05. IEEE, New York, NY, 539--546. DOI=https://doi.org/10.1109/CVPR.2005.202.

Digital Library

[5]

Chowdhury, A. and Ross, A. 2017. Extracting sub-glottal and supra-glottal features from MFCC using convolutional neural networks for speaker identication in degraded audio signals. In Proceedings of the 2017 IEEE International Joint Conference on Biometrics (Denver, CO, United States, October 1-4, 2017). IJCB '17. IEEE, New York, NY, 608--617. DOI=https://doi.org/10.1109/BTAS.2017.8272748.

Digital Library

[6]

Graves, A. and Jaitly, N. 2014. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the International Conference on Machine Learning (Beijing, China, June 21-26, 2014). ICML '14. JMLR, 1764--1772.

[7]

Heigold, G., Moreno, I., Bengio, S. and Shazeer, N. 2016. End-to-end text-dependent speaker verication. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (Shanghai, China, March 20-25, 2016). ICASSP '16. IEEE, New York, NY, 5115--5119. DOI=https://doi.org/10.1109/ICASSP.2016.7472652.

Digital Library

[8]

Hochreiter, S. and Schmidhuber, J. 1997. Long short-term memory. Neural Computation 9, 8 (Nov. 1997), 1735--1780. DOI=https://doi.org/10.1162/neco.1997.9.8.1735.

Digital Library

[9]

Li, J., Mohamed, A., Zweig, G., and Gong, Y. 2015. LSTM time and frequency recurrence for automatic speech recognition. In Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (Scottsdale, AZ, United States, December 13-17, 2015). ASRU '15. IEEE, New York, NY, 187--191. DOI=https://doi.org/10.1109/ASRU.2015.7404793.

[10]

Li, X. and Wu, X. 2015. Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (Brisbane, Australia, April 19-24, 2015). ICASSP '15. IEEE, New York, NY, 4520--4524. DOI=https://doi.org/10.1109/ICASSP.2015.7178826.

[11]

McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., and Nieto, O. 2015. Librosa: Audio and music signal analysis in Python. In Proceedings of the 14th Python in Science Conference (Austin, Texas, July 6-12, 2015). SciPy '15. 18--24.

[12]

Pahwa, A. and Aggarwal, G. 2016. Speech feature extraction for gender recognition. Int. J. Image, Graph. and Signal Processing 8, 9 (Sept. 2016), 17--25. DOI=https://doi.org/ 10.5815/ijigsp.2016.09.03.

[13]

Qian, Y., Bi, M., Tan, T., and Yu, K. 2016. Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio, Speech, and Lang. Process. 24, 12 (Dec. 2016), 2263--2276. DOI=https://doi.org/10.1109/TASLP.2016.2602884.

Digital Library

[14]

Ravuri, S. and Stolcke, A. 2015. Recurrent neural network and LSTM models for lexical utterance classication. In Proceedings of the 16th Annual Conference of the International Speech Communication Association (Dresden, Germany, September 6-10, 2015). INTERSPEECH '15. 135--139.

[15]

Sainath, T. N., Vinyals, O., Senior, A., and Sak, H. 2015. Convolutional, long short-term memory, fully connected deep neural networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (Brisbane, Australia, April 19-24, 2015). ICASSP '15. IEEE, New York, NY, 4580--4584. DOI=https://doi.org/10.1109/ICASSP.2015.7178838.

[16]

Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. 2014. Deepface: Closing the gap to human-level performance in face verication. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Columbus, OH, United States, June 23-28, 2014). CVPR '14. IEEE, New York, NY, 1701--1708. DOI= http://dx.doi.org/10.1109/FCVPR.2014.220.

[17]

Zhang, Y., Yu, M., Li, N., Yu, C., Cui, J., and Yu, D. 2019. Seq2Seq attentional siamese neural networks for text-dependent speaker verication. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (Brighton, United Kingdom, May 12-17, 2019). ICASSP '19. IEEE, New York, NY, 6131--6135. DOI=https://doi.org/10.1109/ICASSP.2019.8682676.

Cited By

Boonsiri JChintakovid TBhumpenpein N(2024)Mobile random text-based voice authentication for older adults: A pilot studyJournal of Applied Research on Science and Technology (JARST)10.60101/jarst.2023.255839Online publication date: 13-Aug-2024
https://doi.org/10.60101/jarst.2023.255839
Shende STembhurne JAnsari N(2024)Deep learning based authentication schemes for smart devices in different modalities: progress, challenges, performance, datasets and future directionsMultimedia Tools and Applications10.1007/s11042-024-18350-583:28(71451-71493)Online publication date: 8-Feb-2024
https://doi.org/10.1007/s11042-024-18350-5
N KR A(2023)An Efficient Voice Authentication System using Enhanced Inceptionv3 AlgorithmJournal of Machine and Computing10.53759/7669/jmc202303032(379-393)Online publication date: 5-Oct-2023
https://doi.org/10.53759/7669/jmc202303032

Index Terms

Voice Authentication Model for One-time Password Using Deep Learning Models
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Security services
    1. Authentication

Recommendations

Voice pathology detection on spontaneous speech data using deep learning models
Abstract
Speech problems are a common issue that affects people everywhere and can affect the quality of their lives. The human speech production system involves various components. Dysfunction of any of these components can disrupt normal speech ...
Deep Learning for Robust Speech Command Recognition Using Convolutional Neural Networks (CNN)
IC3INA '22: Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications

The rapid development of mobile devices has made human-computer interaction through voice increasingly popular and effective. This condition is made possible by the rapid growth of Automatic Speech Recognition (ASR) technologies. ASR can convert human ...
Accent modification for speech recognition of non-native speakers using neural style transfer
Abstract
Nowadays automatic speech recognition (ASR) systems can achieve higher and higher accuracy rates depending on the methodology applied and datasets used. The rate decreases significantly when the ASR system is being used with a non-native speaker ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

BDET '20: Proceedings of the 2020 2nd International Conference on Big Data Engineering and Technology

January 2020

126 pages

ISBN:9781450376839

DOI:10.1145/3378904

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Natl University of Singapore: National University of Singapore
Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

BDET 2020

BDET 2020: 2020 2nd International Conference on Big Data Engineering and Technology

January 3 - 5, 2020

Singapore, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
249
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Boonsiri JChintakovid TBhumpenpein N(2024)Mobile random text-based voice authentication for older adults: A pilot studyJournal of Applied Research on Science and Technology (JARST)10.60101/jarst.2023.255839Online publication date: 13-Aug-2024
https://doi.org/10.60101/jarst.2023.255839
Shende STembhurne JAnsari N(2024)Deep learning based authentication schemes for smart devices in different modalities: progress, challenges, performance, datasets and future directionsMultimedia Tools and Applications10.1007/s11042-024-18350-583:28(71451-71493)Online publication date: 8-Feb-2024
https://doi.org/10.1007/s11042-024-18350-5
N KR A(2023)An Efficient Voice Authentication System using Enhanced Inceptionv3 AlgorithmJournal of Machine and Computing10.53759/7669/jmc202303032(379-393)Online publication date: 5-Oct-2023
https://doi.org/10.53759/7669/jmc202303032
Lewis BVenkatasubramanian KKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)“I...Got my Nose-Print. But it Wasn’t Accurate”: How People with Upper Extremity Impairment Authenticate on their Personal Computing DevicesProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445070(1-14)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445070

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten