ABSTRACT
Speaker recognition, also known as voiceprint recognition, as the name implies, is to identify "who is speaking" by sound, and is a biometric identification technology that identifies the speaker's identity based on the speaker's personality information in the voice signal. In this paper, through a survey of speaker recognition literature and related technologies, the two main tasks of speaker recognition, speaker confirmation and speaker recognition, are introduced, and some models in the development of speaker recognition technology are introduced. From the early Gaussian Mixture Model-Universal Background Model, to Joint Factor Analysis and I-vector model, to the emergence of various new feature models combined with deep learning, the recognition effect is getting better and better. Recognizable scenarios are also becoming more complex. Finally, the speaker recognition technology is summarized and its future research is prospected.
- D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Process., vol. 10, no. 1–3, pp. 19–41, Jan. 2000.Google ScholarDigital Library
- Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification. IEEE signal processing letters, 2006, 13(5): 308-311.Google ScholarCross Ref
- Dehak N, Dumouchel P, Kenny P. Modeling prosodic features with joint factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(7): 2095-2103.Google ScholarDigital Library
- Dehak N, Kenny P, Dehak R, Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788-798.Google ScholarDigital Library
- Variani E, Lei X, McDermott E, Deep neural networks for small footprint text-dependent speaker verification . IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014: 4052-4056.Google Scholar
- Chen Y, Lopez-Moreno I, Sainath T N, Locally-connected and convolutional neural networks for small footprint speaker recognition//Sixteenth Annual Conference of the International Speech Communication Association. 2015.Google Scholar
- Snyder D, Garcia-Romero D, Povey D, Deep Neural Network Embeddings for Text-Independent Speaker Verification//Interspeech. 2017: 999-1003.Google Scholar
- Snyder D, Garcia-Romero D, Sell G, X-vectors: Robust dnn embeddings for speaker recognition//2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018: 5329-5333.Google Scholar
- Doddington G R. Speaker recognition—Identifying people by their voices. Proceedings of the IEEE, 1985, 73(11): 1651-1664.Google ScholarCross Ref
- Saquib Z, Salam N, Nair R P, A survey on automatic speaker recognition systems. Signal Processing and Multimedia, 2010: 134-145.Google Scholar
- Kinnunen T, Li H. An overview of text-independent speaker recognition: From features to supervectors. Speech communication, 2010, 52(1): 12-40.Google Scholar
- Hansen J H L, Hasan T. Speaker recognition by machines and humans: A tutorial review. IEEE Signal processing magazine, 2015, 32(6): 74-99.Google Scholar
- Gehring J, Miao Y, Metze F, Extracting deep bottleneck features using stacked auto-encoders//2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013: 3377-3381.Google Scholar
- Chen N, Qian Y, Yu K. Multi-task learning for text-dependent speaker verification//Sixteenth annual conference of the international speech communication association. 2015.Google Scholar
- Yuan X, Li G, Han J, Overview of the development of speaker recognition//Journal of Physics: Conference Series. IOP Publishing, 2021, 1827(1): 012125.Google Scholar
Recommendations
Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM
We presented a new text-independent/text-prompted speaker recognition method by combining speaker-specific Gaussian Mixture Model (GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style'...
Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification
In the European Telecommunication Standards Institute (ETSI), Distributed Speech Recognition (DSR) front-end, the distortion added due to feature compression on the front end side increases the variance flooring effect, which in turn increases the ...
Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems
AbstractThe development in the interface of smart devices has lead to voice interactive systems. An additional step in this direction is to enable the devices to recognize the speaker. But this is a challenging task because the interaction ...
Comments