skip to main content
10.1145/3478905.3478915acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdsitConference Proceedingsconference-collections
research-article

Cantonese speaker recognition system based on EM algorithm in noisy environments

Published: 28 September 2021 Publication History

Abstract

With the wide application of electronic computers and artificial intelligent machines, people find that the best way of communication between man and machine is voice communication. Nowadays, highly accurate speaker identification systems are required. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. In this paper, Mel-Frequency Cepstrum Coefficient (MFCC) is used to extract speakers features, and Gaussian Mixture Model (GMM) is established for the speakers, Maximum Likelihood Estimate (MLE) is use to estimate the model parameters of GMM. The aim is to identify the Cantonese speaker quickly and accurately in the noisy environment. Experimental results indicate that the method is very effective under noisy environments.

References

[1]
Bo Tang, Zhen Chen, Gerald Hefferman, Tao Wei, Haibo He, Qing Yang. 2015. A Hierarchical Distributed Fog Computing Architecture for Big Data Analysis in Smart Cities. Ase Bigdata & Socialinformatics, Kaohsiung, Taiwan. https://dl.acm.org/doi/epdf/10.1145/2818869.2818898.
[2]
M.M.Sondhi. 2009. Springer handbook of speech processing. Springer-Verlag, New Yoork, Inc. https://doi.org/10.1007/978-3-540-49127-9.
[3]
Xuedong Huang, Alex Acero and Hsiao-Wuen Hon. 2001. Spoken Language Processing, New York. https://dl.acm.org/doi/book/10.5555/560905.
[4]
Reynolds D.A. and Rose R.C.1995. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, Vol.3, Issue:1, Page(s):72-83. https://doi.org/10.1109/89.365379.
[5]
Amirsina Torfi, Jeremy Dawson, Nasser M.Nasrabadi. 2018. Text-independent speaker verification using 3D convolutional neural networks. Proceeding of ICME 2018. https://arxiv.org/abs/1705.09422.
[6]
Mirco Ravanelli, Yoshua Bengio. 2018. Speaker recognition from raw waveform with SincNet. Proceeding of IEEE Spoken Language Technology Workshop(SLT). Athens,Greece. https://doi.org/10.1109/SLT.2018.8639585.
[7]
Bing-Fei Wu, Kun-Ching Wang. 2005. Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Transactions on Speech and Audio Processing, Vol.13. https://doi.org/10.1109/TSA.2005.851909.
[8]
Nirmalya Thakur and Chia Y. Han. 2018. An Activity Analysis Model for Enhancing User Experiences in Affect Aware Systems. 2018 IEEE 5G World Forum(5GWF).Silicon Valley,CA,USA. https://doi.org/10.1109/5GWF.2018.8517032.
[9]
Khamis A.Al-Karawi, Ahmed H.Al-Noori, Francis F.Li and Tim Ritchings. 2015. Automatic speaker recognition system in adverse conditions-implication of noise and reverberation on system performance. Information and Electronics Engineering.https://doi.org/10.7763/IJIEE.2015.V5.571.
[10]
Furui S.1981. Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing. https://doi.org/10.1109/TASSP.1981.1163530.
[11]
H.Hermansky,N. Morgan.1994. RASTA processing of speech. IEEE Transactions on Speech and Audio Porcessing.https://doi.org/10.1109/89.326616.
[12]
Khamis A.Al-Karawi. 2019. Robustness Speaker Recognition based on feature space in clean and noisy condition. International Journal of Sensors,Wireless Communications and Control. https://doi.org/10.2174/2210327909666181219143918.
[13]
R.Vergin, D.O'Shaughnessy, and A.Farhat.1999. Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Transactions on Speech and Audio Processing.https://doi.org/10.1109/89.784104.
[14]
Md.Rashidul Hassan, M Jamil, Md.Golam Rabbani and Md.Saifur Rahman.2004. Speaker identification using Mel frequency cepstral coefficients. In Proceedings of the 3rd International Conference on Electrical & Computer Engineering. Dhaka, Bangladesh.https://www.researchgate.net/publication/255574793.
[15]
Minghua Shi and Amine Bermak.2006. An efficient digital VLSI implementation of Gaussian mixture models-based classifier. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. https://doi.org/10.1109/TVSLI.2006.884048.
[16]
Ing-Jr Ding and Chih-Ta Yen. 2015. Enhancing GMM speaker identification by incorporating SVM speaker verification for intelligent web-based speech application. Multimedia Tools and Applications.https://doi.org/10.1007/s11042-013-1587-5.
[17]
Qian Feng, Guang-min Hu and Xing-miao Yao.2008. Semi-supervised internet network traffic classification using a Gaussian mixture model. International Journal of Electronics and Communications. https://doi.org/10.1016/j.aeue.2007.07.006.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DSIT 2021: 2021 4th International Conference on Data Science and Information Technology
July 2021
481 pages
ISBN:9781450390248
DOI:10.1145/3478905
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. EM
  2. MFCC
  3. MLE
  4. Speaker recognition
  5. noise

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DSIT 2021

Acceptance Rates

Overall Acceptance Rate 114 of 277 submissions, 41%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 55
    Total Downloads
  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media