An information set-based robust text-independent speaker authentication

Medikonda, Jeevan; Bhardwaj, Saurabh; Madasu, Hanmandlu

doi:10.1007/s00500-019-04277-9

An information set-based robust text-independent speaker authentication

Methodologies and Application
Published: 14 August 2019

Volume 24, pages 5271–5287, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

302 Accesses
4 Citations
Explore all metrics

Abstract

This paper presents a method for the extraction of twofold information set (TFIS) features for the text-independent speaker recognition. The method takes the Mel frequency cepstral coefficients from the frames of a sample speech signal and forms a matrix. From this, both spatial and temporal information components are derived based on the information set concept using the entropy framework. The TFIS features comprising their combination of two components are less in number thus reducing the computational time, complexity and improving the performance under the noisy environment. The proposed approach is tested on three datasets namely NIST-2003, VoxForge 2014 speech corpus and VCTK speech corpus in terms of speed, computational complexity, memory requirement and accuracy. Its performance is validated under different noisy environments at different signal-to-noise ratios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Speech Emotion Recognition: a Systematic Literature Review

Article 07 April 2024

Haidy H. Mustafa, Nagy R. Darwish & Hesham A. Hefny

Milestones in speaker recognition

Article Open access 15 February 2024

R. Sharma, D. Govind, … S. R. M. Prasanna

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

Qiang Li, Qianyu Mai, … Mingjuan Ma

References

Aggarwal M, Hanmandlu M (2015) Representing uncertainty with information sets. IEEE Trans Fuzzy Syst 24(1):1–15
Article Google Scholar
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Article Google Scholar
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Article Google Scholar
Ephraim Y, Malah D (1984) Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 32(6):1109–1121
Article Google Scholar
Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272
Article Google Scholar
Hanmandlu M, Das A (2011) Content-based image retrieval by information theoretic measure. Def Sci J 61(5):415–430
Article Google Scholar
Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2(4):578–589
Article Google Scholar
Jawarkar NP, Holambe RS, Basu TK (2011) Use of fuzzy min–max neural network for speaker identification. In: 2011 international conference on recent trends in information technology (ICRTIT)
Jayanna HS, Prasanna SRM (2009) Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Proc 3(3):189–204
Article Google Scholar
Jeevan M, Madasu H, Panigrahi BK (2016) Information set based gait authentication system. Neurocomputing 207:1–14
Article Google Scholar
Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans Audio Speech Lang Process 15(4):1435–1447
Article Google Scholar
Kinnunen T, Hautamäki V, Fränti P (2006) On the use of long-term average spectrum in automatic speaker recognition. In: 5th international symposium on chinese spoken language processing (ISCSLP’06). Singapore, pp 559–567
Kumar K, Kim C, Stern RM (2011) Delta-spectral cepstral coefficients for robust speech recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP)
Lee KY (2004) Local fuzzy PCA based GMM with dimension reduction on speaker identification. Pattern Recogn Lett 25(16):1811–1817
Article Google Scholar
Longworth C, Gales MJF (2009) Combining derivative and parametric kernels for speaker verification. IEEE Trans Audio Speech Lang Process 17(4):748–757
Article Google Scholar
Madasu H (2011) Information sets and information processing. Def Sci J 61(5):405–407
Article Google Scholar
Mak MW, Pang X, Chien JT (2016) Mixture of PLDA for noise robust i-vector speaker verification. IEEE/ACM Trans Audio Speech Lang Process 24(1):130–142
Article Google Scholar
Mamta B, Madasu H (2014a) A new entropy function and a classifier for thermal face recognition. Eng Appl Artif Intell 36:269–286
Article Google Scholar
Mamta B, Madasu H (2014b) Robust authentication using the unconstrained infrared face images. Expert Syst Appl 41(14):6494–6511
Article Google Scholar
Mandasari MI, Mitchell ML, van Leeuwen DA (2011) Evaluation of i-vector speaker recognition systems for forensic application. In: INTERSPEECH
Markel J, Oshika B, Gray A (1977) Long-term feature averaging for speaker recognition. IEEE Trans Acoust Speech Signal Process 25(4):330–337
Article Google Scholar
[Online] (2003) The NIST year 2003 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/sre/2003/2003-spkrec-evalplan-v2.2.pdf
[Online] (2009) The Centre for Speech Technology Research VCTK Corpus
[Online] (2015) VoxForge speech corpus. http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/
Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification. A speaker odyssey—the speaker recognition workshop. Crete, Greece, International Speech Communication Association (ISCA), pp 213–218
Pinheiro HNB, Vieira SRF, Ren TI, Cavalcanti GDC, de Mattos NPSG (2016). Type-2 fuzzy GMM for text-independent speaker verification under unseen noise conditions. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP)
Pujol P, Macho D, Nadeu C (2006). On real-time mean-and-variance normalization of speech recognition features. In: 2006 IEEE international conference on acoustics speech and signal processing proceedings
Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 17(1–2):91–108
Article Google Scholar
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83
Article Google Scholar
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10(1–3):19–41
Article Google Scholar
Lung S-Y (2004a) Adaptive fuzzy wavelet algorithm for text-independent speaker recognition. Pattern Recogn 37(10):2095–2096
Article Google Scholar
Lung S-Y (2004b) Further reduced form of wavelet feature for text independent speaker recognition. Pattern Recogn 37(7):1565–1566
Article Google Scholar
Sohn J, Kim NS, Sung W (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3
Article Google Scholar
Togneri R, Pullella D (2011) An overview of speaker identification: accuracy and robustness issues. IEEE Trans Circuits Syst Mag 11(2):23–61
Article Google Scholar
Wan V, Renals S (2005) Speaker verification using sequence discriminant support vector machines. IEEE Trans Speech Audio Process 13(2):203–210
Article Google Scholar
Wang Y, Liu X, Xing Y, Li M (2008) A novel reduction method for text-independent speaker identification. In: 2008 fourth international conference on natural computation
Zhao X, Wang DL (2013). Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: IEEE international conference on acoustics, speech and signal processing (ICASSP)
Mirhassani SM, Ting H-N (2014) Fuzzy-based discriminative feature representation for children’s speech recognition. Digital Signal Process 31:102–114
Article Google Scholar
Yuan ZX, Yu CZ, Fang Y (1993) Text independent speaker identification using fuzzy mathematical algorithm. In: IEEE international conference on acoustics, speech, and signal processing, ICASSP
Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
Article Google Scholar
Zhao X, Shao Y, Wang DL (2012) CASA-based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(5):1608–1616
Article Google Scholar

Download references

Acknowledgements

This is a part of the ongoing project on “Personal Authentication using Multimodal Behavioral Biometrics: Voice and Gait” and the authors express their gratitude to the Department of Science and Technology, Government of India (Grant No. SB/S3/EECE/0127/2013) for funding the project.

Author information

Authors and Affiliations

Manipal Academy of Higher Education, Manipal, Karnataka, India
Jeevan Medikonda
Thapar Institute of Engineering and Technology, Patiala, India
Saurabh Bhardwaj
Indian Institute of Technology, New Delhi, India
Hanmandlu Madasu

Authors

Jeevan Medikonda
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Bhardwaj
View author publications
You can also search for this author in PubMed Google Scholar
Hanmandlu Madasu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeevan Medikonda.

Ethics declarations

Conflict of interest

The authors’ declare that they have no conflict of interest.

Human and animals rights

This article does not contain any studies with direct human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Medikonda, J., Bhardwaj, S. & Madasu, H. An information set-based robust text-independent speaker authentication. Soft Comput 24, 5271–5287 (2020). https://doi.org/10.1007/s00500-019-04277-9

Download citation

Published: 14 August 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s00500-019-04277-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An information set-based robust text-independent speaker authentication

Abstract

Access this article

Similar content being viewed by others

Automatic Speech Emotion Recognition: a Systematic Literature Review

Milestones in speaker recognition

Chinese dialect speech recognition: a comprehensive survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animals rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An information set-based robust text-independent speaker authentication

Abstract

Access this article

Similar content being viewed by others

Automatic Speech Emotion Recognition: a Systematic Literature Review

Milestones in speaker recognition

Chinese dialect speech recognition: a comprehensive survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animals rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation