Skip to main content

Higher order information set based features for text-independent speaker identification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper Type-2 Information Set (T2IS) features and Hanman Transform (HT) features as Higher Order Information Set (HOIS) based features are proposed for the text independent speaker recognition. The speech signals of different speakers represented by Mel Frequency Cepstral Coefficients (MFCC) are converted into T2IS features and HT features by taking account of the cepstral and temporal possibilistic uncertainties. The features are classified by Improved Hanman Classifier (IHC), Support Vector Machine (SVM) and k-Nearest Neighbours (kNN). The performance of the proposed approaches is tested in terms of speed, computational complexity, memory requirement and accuracy on three datasets namely NIST-2003, VoxForge 2014 speech corpus and VCTK speech corpus and compared with that of the baseline features like MFCC, ∆MFCC, ∆∆MFCC and GFCC under white Gaussian noisy environment at different signal-to-noise ratios. The proposed features have the reduced feature size, computational time, and complexity and also their performance is not degraded under the noisy environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Aggarwal, M., & Hanmandlu, M. (2015). Representing uncertainty with information sets. IEEE Transactions on Fuzzy Systems, 24, 1–15.

    Article  Google Scholar 

  • Chang, C.-C., & Lin, C.-J., LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 1–27, 2011.

    Article  Google Scholar 

  • Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28, 357–366.

    Article  Google Scholar 

  • Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 32, 1109–1121.

    Article  Google Scholar 

  • Hanmandlu, M. (2011). Information sets and information processing. Defence Science Journal, 61, 405–407.

    Article  Google Scholar 

  • Hanmandlu, M., & Das, A. (2011). Content-based image retrieval by information theoretic measure. Defence Science Journal, 61, 415–430.

    Article  Google Scholar 

  • Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2, 578–589.

    Article  Google Scholar 

  • Jawarkar, N. P., Holambe, R. S., & Basu, T. K., Use of fuzzy min-max neural network for speaker identification, In 2011 International Conference on Recent Trends in Information Technology (ICRTIT), 2011, pp. 178–182.

  • Jayanna, H. S., & Prasanna, S. R., & Mahadeva. (2009, Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Processing, 3(3), 189–204.

    Article  MATH  Google Scholar 

  • Kumar, K., Kim, C. & Stern, R. M., Delta-spectral cepstral coefficients for robust speech recognition, In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 4784–4787.

  • Lee, K. Y. (2004). Local fuzzy PCA based GMM with dimension reduction on speaker identification. Pattern Recognition Letters, 25, 1811–1817.

    Article  Google Scholar 

  • Lung, S.-Y. (2004). Further reduced form of wavelet feature for text independent speaker recognition. Pattern Recognition, 37, 1565–1566.

    Article  MATH  Google Scholar 

  • Lung, S.-Y. (2004). Adaptive fuzzy wavelet algorithm for text-independent speaker recognition. Pattern Recognition, 37, 2095–2096.

    Article  Google Scholar 

  • Mamta, & Hanmandlu, M. (2014). Robust authentication using the unconstrained infrared face images. Expert Systems with Applications, 41, 6494–6511.

    Article  Google Scholar 

  • Mamta, & Hanmandlu, M. (2014). A new entropy function and a classifier for thermal face recognition. Engineering Applications of Artificial Intelligence, 36, 269–286.

    Article  Google Scholar 

  • Medikonda, J., Madasu, H., & Panigrahi, B. K. (2016). Information set based gait authentication system. Neurocomputing, 207, 1–14.

    Article  Google Scholar 

  • Mirhassani, S. M., & Ting, H.-N. (2014). Fuzzy-based discriminative feature representation for children’s speech recognition. Digital Signal Processing, 31, 102–114.

    Article  Google Scholar 

  • NIST (2003). The NIST year 2003 speaker recognition evaluation plan. Available: http://www.itl.nist.gov/iad/mig/tests/sre/2003/2003-spkrec-evalplan-v2.2.pdf.

  • Pelecanos, J., & Sridharan, S. (2001). Feature Warping for Robust Speaker Verification, presented at the A Speaker Odyssey—The Speaker Recognition Workshop, Crete.

  • Pinheiro, H. N. B., Vieira, S. R. F., Ren, T. I., Cavalcanti, G. D. C., & de Mattos Neto, P. S. G. (2016). Type-2 fuzzy GMM for text-independent speaker verification under unseen noise conditions, In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5490–5494.

  • Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3, 72–83.

    Article  Google Scholar 

  • Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.

    Article  Google Scholar 

  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10, 19–41.

    Article  Google Scholar 

  • Sohn, J., Kim, N. S., Sung, W. (1999). A statistical model-based voice activity detection”. IEEE Signal Processing Letters, 6, 1–3.

    Article  Google Scholar 

  • Togneri, R., & Pullella, D. (2011). An overview of speaker identification: accuracy and robustness issues. IEEE Transactions on Circuits and Systems Magazine, 11, 23–61.

    Article  Google Scholar 

  • VCTK (2009). The Centre for Speech Technology Research VCTK Corpus.

  • VoxForge (2015). VoxForge speech corpus. Available: http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/.

  • Wang, Y., Liu, X., Xing, Y., & Li, M. (2008). A Novel Reduction Method for Text-Independent Speaker Identification,” in 2008 Fourth International Conference on Natural Computation, pp. 66–70.

  • Yuan, Z. X., Yu, C. Z., & Fang, Y. (1993). Text independent speaker identification using fuzzy mathematical algorithm, In 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-93, Vol. 2., pp. 403–406.

  • Zhao X., & Wang D. L. (2013). Analyzing noise robustness of MFCC and GFCC features in speaker identification, In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7204–7208.

  • Zhao X., Shao Y., Wang D. L. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 20, 1608–1616.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Medikonda Jeevan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Medikonda, J., Madasu, H. Higher order information set based features for text-independent speaker identification. Int J Speech Technol 21, 451–461 (2018). https://doi.org/10.1007/s10772-017-9472-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9472-7

Keywords