Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

Wang, Jia-Ching; Wang, Chien-Yao; Chin, Yu-Hao; Liu, Yu-Ting; Chen, En-Ting; Chang, Pao-Chi

doi:10.1007/s11042-016-3335-0

Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

Published: 17 February 2016

Volume 76, pages 4055–4068, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jia-Ching Wang¹,
Chien-Yao Wang¹,
Yu-Hao Chin¹,
Yu-Ting Liu²,
En-Ting Chen² &
…
Pao-Chi Chang²

570 Accesses
15 Citations
Explore all metrics

Abstract

This paper proposes a speaker recognition system using acoustic features that are based on spectral-temporal receptive fields (STRFs). The STRF is derived from physiological models of the mammalian auditory system in the spectral-temporal domain. With the STRF, a signal is expressed by rate (in Hz) and scale (in cycles/octaves). The rate and scale are used to specify the temporal response and spectral response, respectively. This paper uses the proposed STRF based feature to perform speaker recognition. First, the energy of each scale is calculated using the STRF representation. A logarithmic operation is then applied to the scale energies. Finally, a discrete cosine transform is utilized to the generation of the proposed STRF feature. This paper also presents a feature set that combines the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs). The support vector machines (SVMs) are adopted to be the speaker classifiers. To evaluate the performance of the proposed speaker recognition system, experiments on 36-speaker recognition were conducted. Comparing with the MFCC baseline, the proposed feature set increases the speaker recognition rates by 3.85 % and 18.49 % on clean and noisy speeches, respectively. The experiments results demonstrate the effectiveness of adopting STRF based feature in speaker recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Feature Level Fusion Scheme for Robust Speaker Identification

Acoustic feature extraction method for robust speaker identification

Article 05 May 2015

A Comparison Between MFCC and MSE Features for Text-Independent Speaker Recognition Using Machine Learning Algorithms

References

Andrew O. Hatch, Sachin K, Andreas S (2006) Within-class covariance normalization for SVM-based speaker recognition. In: 2006 ICSLP
Anthony L, Kong AL, Bin M, Haizhou L (2013) Phonetically-constrained plda modeling for text-dependent speaker verification with multiple short utterances. Human Language Technology Department, Institute for Infocomm Research, A*STAR, Singapore
Google Scholar
Anthony L, Kong AL, Bin M, Haizhou L (2015) The RSR2015: database for text-dependent speaker verification using multiple pass-phrases. Institute for Infocomm Research (I2R), A*STAR, Singapore
Google Scholar
Campbell WM, Campbell JP, Reynolds DA, Singer E, Torres-Carrasquillo PA (2006) Support vector machines for speaker and language recognition. In: Computer Speech and Language
Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. In: IEEE Signal Processing Letters
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. In: ACM Transactions on Intelligent Systems and Technology
Chi TS, Lin TH, Hsu CC (2012) Spectro-temporal modulation energy based mask for robust speaker identification. J Acoust Soc Am 131(5):368–374
Article Google Scholar
Chi TS, Ru P, Shamma S (2005) Multiresolution spectrotemporal analysis of complex sounds. J Acoust Soc Am 118:887–906
Article Google Scholar
Desai S, Black AW, Prahallad K (2010) Spectral mapping using artificial neural networks for voice conversion. IEEE Trans Audio Speech Lang Process 18(5):954–964
Article Google Scholar
Didier M, Andrzej D (2001) Forensic speaker recognition based on a Bayesian framework and Gaussian mixture modelling (GMM). In: ODYSSEY-2001, Crete, Greece.
Ding IR, Yen CT (2013) Enhancing GMM speaker identification by incorporating SVM speaker verification for intelligent web-based speech applications. In: Multimedia Tools and Applications
Douglas AR, Richard CR (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. In: IEEE Transactions on Speech and Audio Processing
Hsu W, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Article Google Scholar
Juang BH, Chen TH (1998) The past, present, and future of speech processing. IEEE Signal Process Mag 15(3):24–48
Article Google Scholar
Khan SA, Anil ST, Jagannath HN, Vinay SP (2015) A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), Kolkata
Kuan TW, Wang JF, Wang JC, Lin PC, Gu GH (2012) VLSI design of an SVM learning core on sequential minimal optimization algorithm. IEEE Trans Very Large Scale Integr VLSI Syst 20(4):673–683
Article Google Scholar
Kuruvachan KG, Arunraj K, Sreekumar KT, Santhosh KC, Ramachandran KI (2014) Towards improving the performance of text/language independent speaker recognition systems. In: International Conference on Power, Signals, Controls and Computation (EPSCICON)
Lukáš B, Pavel M, Petr S, Ondřej G, Jan Č (2007) Analysis of feature extraction and channel compensation in a GMM speaker recognition system. In: IEEE Transactions on Audio, Speech, and Language Processing
Srinivas V, Santhi rani C, Madhu T (2013) Investigation of decision tree induction, probabilistic technique and SVM for speaker identification. Int J Signal Process Image Process Pattern Recog 6(6):193–204
Google Scholar
Stafylakis T, Kenny P, Ouellet P, Perez J, Kockmann M, Dumouchel P (2013) Text-dependent speaker recognition using PLDA with uncertainty propagation. Centre de Recherche Informatique de Montreal (CRIM), Canada
Google Scholar
Tuzun OB, Demirekler M, Nakiboglu KB, (1994) Comparison of parametric and non-parametric representations of speech for recognition. In: Proc. 7th Mediterranean Electrotechnical Conference, 1994, pp 65–68
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Wang JC, Chin YH, Hsieh WC, Lin CH, Chen YR, Siahaan E (2015) Speaker identification with whispered speech for the access control system. IEEE Trans Autom Sci Eng 12(4):1191–1199
Article Google Scholar
Wang JC, Lee YS, Lin CH, Siahaan E, Yang CH (2015) Robust environmental sound recognition with fast noise suppression for home automation. IEEE Trans Autom Sci Eng 12(4):1235–1242
Article Google Scholar
Wang JC, Lian LX, Lin YY, Zhao JH (2015) VLSI design for SVM-based speaker verification system. IEEE Trans Very Large Scale Integr VLSI Syst 23(7):1355–1359
Article Google Scholar
Wang JC, Lin CH, Chen ET, Chang PC (2014) Spectral-temporal receptive fields and mfcc balanced feature extraction for noisy speech recognition. In: Asia-Pacific Signal and Information Processing Association (APSIPA)
Wang JC, Wang JF, Weng YS (2002) Chip design of MFCC extraction for speech recognition. Integr VLSI J 32(1–3):111–131
Article MATH Google Scholar
Wang JC, Yang CH, Wang JF, Lee HP (2007) Robust speaker identification and verification. IEEE Comput Intell Mag 2(2):52–59
Article Google Scholar
Woojay J, Juang BH (2008) Speech analysis in a model of the central auditory system. IEEE Trans Audio Speech Lang Process 15(6):1802–1817
Article Google Scholar
Yun L, Nicolas S, Luciana F, Mitchell M (2014) A novel scheme for speaker recognition using a phonetically-aware deep neural network. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florence
Google Scholar
Zhe J, Wei H, Xin J (2013) Duration weighted Gaussian mixture model supervector modeling for robust speaker recognition. In: 2013 Ninth International Conference on Natural Computation (ICNC), Shenyang, China

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Central University, Jhongli, Taiwan
Jia-Ching Wang, Chien-Yao Wang & Yu-Hao Chin
Department of Communication Engineering, National Central University, Jhongli, Taiwan
Yu-Ting Liu, En-Ting Chen & Pao-Chi Chang

Authors

Jia-Ching Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chien-Yao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hao Chin
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Ting Liu
View author publications
You can also search for this author in PubMed Google Scholar
En-Ting Chen
View author publications
You can also search for this author in PubMed Google Scholar
Pao-Chi Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pao-Chi Chang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, JC., Wang, CY., Chin, YH. et al. Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition. Multimed Tools Appl 76, 4055–4068 (2017). https://doi.org/10.1007/s11042-016-3335-0

Download citation

Received: 03 April 2015
Revised: 05 October 2015
Accepted: 04 February 2016
Published: 17 February 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s11042-016-3335-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

Abstract

Access this article

Similar content being viewed by others

A Feature Level Fusion Scheme for Robust Speaker Identification

Acoustic feature extraction method for robust speaker identification

A Comparison Between MFCC and MSE Features for Text-Independent Speaker Recognition Using Machine Learning Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

Abstract

Access this article

Similar content being viewed by others

A Feature Level Fusion Scheme for Robust Speaker Identification

Acoustic feature extraction method for robust speaker identification

A Comparison Between MFCC and MSE Features for Text-Independent Speaker Recognition Using Machine Learning Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation