skip to main content
10.1145/3544109.3544169acmotherconferencesArticle/Chapter ViewAbstractPublication PagesipecConference Proceedingsconference-collections
research-article

Research on Speaker Recognition Technology Based on Feature Model

Published:18 July 2022Publication History

ABSTRACT

Speaker recognition, also known as voiceprint recognition, as the name implies, is to identify "who is speaking" by sound, and is a biometric identification technology that identifies the speaker's identity based on the speaker's personality information in the voice signal. In this paper, through a survey of speaker recognition literature and related technologies, the two main tasks of speaker recognition, speaker confirmation and speaker recognition, are introduced, and some models in the development of speaker recognition technology are introduced. From the early Gaussian Mixture Model-Universal Background Model, to Joint Factor Analysis and I-vector model, to the emergence of various new feature models combined with deep learning, the recognition effect is getting better and better. Recognizable scenarios are also becoming more complex. Finally, the speaker recognition technology is summarized and its future research is prospected.

References

  1. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Process., vol. 10, no. 1–3, pp. 19–41, Jan. 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification. IEEE signal processing letters, 2006, 13(5): 308-311.Google ScholarGoogle ScholarCross RefCross Ref
  3. Dehak N, Dumouchel P, Kenny P. Modeling prosodic features with joint factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(7): 2095-2103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dehak N, Kenny P, Dehak R, Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788-798.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Variani E, Lei X, McDermott E, Deep neural networks for small footprint text-dependent speaker verification . IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014: 4052-4056.Google ScholarGoogle Scholar
  6. Chen Y, Lopez-Moreno I, Sainath T N, Locally-connected and convolutional neural networks for small footprint speaker recognition//Sixteenth Annual Conference of the International Speech Communication Association. 2015.Google ScholarGoogle Scholar
  7. Snyder D, Garcia-Romero D, Povey D, Deep Neural Network Embeddings for Text-Independent Speaker Verification//Interspeech. 2017: 999-1003.Google ScholarGoogle Scholar
  8. Snyder D, Garcia-Romero D, Sell G, X-vectors: Robust dnn embeddings for speaker recognition//2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018: 5329-5333.Google ScholarGoogle Scholar
  9. Doddington G R. Speaker recognition—Identifying people by their voices. Proceedings of the IEEE, 1985, 73(11): 1651-1664.Google ScholarGoogle ScholarCross RefCross Ref
  10. Saquib Z, Salam N, Nair R P, A survey on automatic speaker recognition systems. Signal Processing and Multimedia, 2010: 134-145.Google ScholarGoogle Scholar
  11. Kinnunen T, Li H. An overview of text-independent speaker recognition: From features to supervectors. Speech communication, 2010, 52(1): 12-40.Google ScholarGoogle Scholar
  12. Hansen J H L, Hasan T. Speaker recognition by machines and humans: A tutorial review. IEEE Signal processing magazine, 2015, 32(6): 74-99.Google ScholarGoogle Scholar
  13. Gehring J, Miao Y, Metze F, Extracting deep bottleneck features using stacked auto-encoders//2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013: 3377-3381.Google ScholarGoogle Scholar
  14. Chen N, Qian Y, Yu K. Multi-task learning for text-dependent speaker verification//Sixteenth annual conference of the international speech communication association. 2015.Google ScholarGoogle Scholar
  15. Yuan X, Li G, Han J, Overview of the development of speaker recognition//Journal of Physics: Conference Series. IOP Publishing, 2021, 1827(1): 012125.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    IPEC '22: Proceedings of the 3rd Asia-Pacific Conference on Image Processing, Electronics and Computers
    April 2022
    1065 pages
    ISBN:9781450395786
    DOI:10.1145/3544109

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 18 July 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format