Abstract
In this paper Type-2 Information Set (T2IS) features and Hanman Transform (HT) features as Higher Order Information Set (HOIS) based features are proposed for the text independent speaker recognition. The speech signals of different speakers represented by Mel Frequency Cepstral Coefficients (MFCC) are converted into T2IS features and HT features by taking account of the cepstral and temporal possibilistic uncertainties. The features are classified by Improved Hanman Classifier (IHC), Support Vector Machine (SVM) and k-Nearest Neighbours (kNN). The performance of the proposed approaches is tested in terms of speed, computational complexity, memory requirement and accuracy on three datasets namely NIST-2003, VoxForge 2014 speech corpus and VCTK speech corpus and compared with that of the baseline features like MFCC, ∆MFCC, ∆∆MFCC and GFCC under white Gaussian noisy environment at different signal-to-noise ratios. The proposed features have the reduced feature size, computational time, and complexity and also their performance is not degraded under the noisy environment.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aggarwal, M., & Hanmandlu, M. (2015). Representing uncertainty with information sets. IEEE Transactions on Fuzzy Systems, 24, 1–15.
Chang, C.-C., & Lin, C.-J., LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 1–27, 2011.
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28, 357–366.
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 32, 1109–1121.
Hanmandlu, M. (2011). Information sets and information processing. Defence Science Journal, 61, 405–407.
Hanmandlu, M., & Das, A. (2011). Content-based image retrieval by information theoretic measure. Defence Science Journal, 61, 415–430.
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2, 578–589.
Jawarkar, N. P., Holambe, R. S., & Basu, T. K., Use of fuzzy min-max neural network for speaker identification, In 2011 International Conference on Recent Trends in Information Technology (ICRTIT), 2011, pp. 178–182.
Jayanna, H. S., & Prasanna, S. R., & Mahadeva. (2009, Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Processing, 3(3), 189–204.
Kumar, K., Kim, C. & Stern, R. M., Delta-spectral cepstral coefficients for robust speech recognition, In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 4784–4787.
Lee, K. Y. (2004). Local fuzzy PCA based GMM with dimension reduction on speaker identification. Pattern Recognition Letters, 25, 1811–1817.
Lung, S.-Y. (2004). Further reduced form of wavelet feature for text independent speaker recognition. Pattern Recognition, 37, 1565–1566.
Lung, S.-Y. (2004). Adaptive fuzzy wavelet algorithm for text-independent speaker recognition. Pattern Recognition, 37, 2095–2096.
Mamta, & Hanmandlu, M. (2014). Robust authentication using the unconstrained infrared face images. Expert Systems with Applications, 41, 6494–6511.
Mamta, & Hanmandlu, M. (2014). A new entropy function and a classifier for thermal face recognition. Engineering Applications of Artificial Intelligence, 36, 269–286.
Medikonda, J., Madasu, H., & Panigrahi, B. K. (2016). Information set based gait authentication system. Neurocomputing, 207, 1–14.
Mirhassani, S. M., & Ting, H.-N. (2014). Fuzzy-based discriminative feature representation for children’s speech recognition. Digital Signal Processing, 31, 102–114.
NIST (2003). The NIST year 2003 speaker recognition evaluation plan. Available: http://www.itl.nist.gov/iad/mig/tests/sre/2003/2003-spkrec-evalplan-v2.2.pdf.
Pelecanos, J., & Sridharan, S. (2001). Feature Warping for Robust Speaker Verification, presented at the A Speaker Odyssey—The Speaker Recognition Workshop, Crete.
Pinheiro, H. N. B., Vieira, S. R. F., Ren, T. I., Cavalcanti, G. D. C., & de Mattos Neto, P. S. G. (2016). Type-2 fuzzy GMM for text-independent speaker verification under unseen noise conditions, In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5490–5494.
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3, 72–83.
Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10, 19–41.
Sohn, J., Kim, N. S., Sung, W. (1999). A statistical model-based voice activity detection”. IEEE Signal Processing Letters, 6, 1–3.
Togneri, R., & Pullella, D. (2011). An overview of speaker identification: accuracy and robustness issues. IEEE Transactions on Circuits and Systems Magazine, 11, 23–61.
VCTK (2009). The Centre for Speech Technology Research VCTK Corpus.
VoxForge (2015). VoxForge speech corpus. Available: http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/.
Wang, Y., Liu, X., Xing, Y., & Li, M. (2008). A Novel Reduction Method for Text-Independent Speaker Identification,” in 2008 Fourth International Conference on Natural Computation, pp. 66–70.
Yuan, Z. X., Yu, C. Z., & Fang, Y. (1993). Text independent speaker identification using fuzzy mathematical algorithm, In 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-93, Vol. 2., pp. 403–406.
Zhao X., & Wang D. L. (2013). Analyzing noise robustness of MFCC and GFCC features in speaker identification, In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7204–7208.
Zhao X., Shao Y., Wang D. L. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 20, 1608–1616.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Medikonda, J., Madasu, H. Higher order information set based features for text-independent speaker identification. Int J Speech Technol 21, 451–461 (2018). https://doi.org/10.1007/s10772-017-9472-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9472-7