Abstract
This work explores the use of phoneme level information in cohort selection to improve the performance of a speaker verification system. In speaker verification, cohort is used in score normalization to get a better performance. Score normalization is a technique to reduce the undesirable variation arising from acoustically mismatched conditions. Proper selection of cohort significantly improves speaker verification performance. In this paper, we investigate cohort selection based on a speaker model cluster under the i-vector framework that we call the i-vector model cluster (IMC). Two approaches for cohort selection are proposed. First approach utilizes speaker specific properties and called speaker specific cohort selection (SSCS). In this approach, speaker level information is used for cohort selection. The second approach is phoneme specific cohort selection (PSCS). This method improves cohort set selection by using phoneme level information. Phoneme level information is further employed in a late fusion approach that uses a majority voting method on normalized scores to improve the performance of the speaker verification system. Speaker verification experiments were conducted using the TIMIT, HINDI and YOHO databases. An equal error rate improvement of 19.01%, 14.61% and 19.4%is obtained for the proposed method compared to the standard ZT-Norm method for TIMIT, HINDI and YOHO datasets. Reasonable improvements in performance are also obtained in terms of minimum decision cost function (min DCF) and detection error trade-off (DET) curves.












Similar content being viewed by others
References
Apsingekar V, DeLeon P (2009) Speaker model clustering for efficient speaker identification in large population applications. IEEE Trans Acoust Speech Signal Process 17(4):848–853
Apsingekar V, DeLeon P (2011) Speaker verification score normalization using speaker model clusters. Speech Comm 53:110–118
Auckenthaler R, Carey M, Lloyd-Thomas H (2000) Score normalization for text-independent speaker verification systems. Digital Signal Process 10(1–3):42–54
Bimbot F, Bonastre J-F, Fredouille C, Gravier G, Magrin-Chagnolleau I, Meignier S, Merlin T, Ortega-García J, Petrovska-Delacrétaz D, Reynolds DA (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Proc 2004:430–451
Campbell J Jr (1997) Speaker recognition: A tutorial. Proc IEEE 85(9):1437–1462
Campbell JP (1995) Testing with the yoho cd-rom voice verification corpus 1995 international conference on acoustics, speech, and signal processing, 1995. ICASSP-95, vol 1. IEEE, pp 341–344
Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using gmm supervectors for speaker verification. Signal Proc Lett IEEE 13(5):308–311
Das RK, Jelil S, Prasanna SM (2016) Significance of constraining text in limited data text-independent speaker verification 2016 international conference on signal processing and communications (SPCOM). IEEE, pp 1–5
(2001) Database for indian languages, Speech and vision lab, IIT Madras, Chennai
Dehak N, Dehak R, Glass J, Reynolds D, Kenny P (2010) Cosine similarity scoring without score norMalization techniques Proceedings Odyssey speaker and language recognition workshop
Eatock S, Mason J (1994) A quantitative assesment of the relative speaker discriminating properties of phonemes Proceedings of the ICASSP 1994, pp 133–136
Fienberg SE (1970) An iterative procedure for estimation in contingency tables. Annals of Mathematical Statistics 41(3):907–917
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press Professional
Garofolo JS (1993) Timit acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia
Hatch AO, Kajarekar SS, Stolcke A (2006) Within-class covariance normalization for svm-based speaker recognition INTERSPEECH, pp 1471–1474
Hosom J-P, Vermeulen PJ, Shaw J (2016) Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination, uS Patent 9,230,550
Hultzen I, Jr JA, Miron M (1964) Tables of transitional frequencies of english phonemes. University of Illinois Press, Urbana, Il
Jirouek R, Peuil S (1995) On the effective implementation of the iterative proportional fitting procedure. Comput Stat Data Anal 19(2):177–189
Kenny P (2005) Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM Montreal (Report) CRIM 06:8–13
Kenny P, Stafylakis T, Alam J, Kockmann M (2015) An i-vector backend for speaker verification Proceedings interspeech, pp 2307–2310
Kinnunen T, Hautamäki V, Fränti P (2004) Fusion of spectral feature sets for accurate speaker identification 9th conference speech and computer
Kinnunen T, Kärkkäinen I, Fränti P Report series a, the mystery of cohort selection
Kucera H, Francis W N (1967) Computational analysis of present day american english. Brown University Press
Larcher A, Bousquet P, Lee K.A, Matrouf D, Li H, Bonastre J-F (2012) I-vectors in the context of phonetically-constrained short utterances for speaker verification 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4773–4776
Lei Y, Scheffer N, Ferrer L, McLaren M (2014) A novel scheme for speaker recognition using a phonetically-aware deep neural network 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1695–1699
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The det curve in assessment of detection task performance Proceedings eurospeech, vol 97, pp 1895–1898
Matějka P, Glembek O, Castaldo F, Alam MJ, Plchot O, Kenny P, Burget L, Černocky J (2011) Full-covariance ubm and heavy-tailed plda in i-vector speaker verification 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4828–4831
Nagineni S, Hegde R (2010) On line client-wise cohort set selection for speaker verification using iterative normalization of confusion matrices Proceedings eursipco, pp 576–580
Najim D, Patrick K, Réda D, Pierre D, Pierre O (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Ramos-Castro D, Fierrez-Aguilar J, Gonzalez-Rodriguez J, Ortega-Garcia J (2007) Speaker verification using speaker-and test-dependent fast score normalization. Pattern Recogn Lett 28(1):90–98
Reynolds DA (1995) Speaker identification and verification using gaussian mixture speaker models. Speech Comm 17(1–2):91–108
Reynolds DA (1997) Comparison of background normalization methods for text-independent speaker verification Eurospeech
Reynolds DA, Campbell WM (2008) Text-independent speaker recognition Springer handbook of speech processing. Springer, pp 763–782
Rosenberg AE (1976) Automatic speaker verification: A review. Proc IEEE 64 (4):475–487
Sturim DE, Reynolds DA (2005) Speaker adaptive cohort selection for tnorm in text-independent speaker verification ICASSP, pp 741–744
Vincent E, Watanabe S, Nugraha AA, Barker J, Marxer R An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Computer Speech & Language
Young S J, Young S (1993) The HTK hidden Markov model toolkit: Design and philosophy. University of Cambridge Department of Engineering
Zeinali H, Sameti H, Burget L, Černockỳ J, Maghsoodi N, Matějka P (2016) i-vector/hmm based text-dependent speaker verification system for reddots challenge. Interspeech 2016:440–444
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahmad, W., Karnick, H. & Hegde, R.M. Client-wise cohort set selection by combining speaker- and phoneme-specific I-vectors for speaker verification. Multimed Tools Appl 77, 8273–8294 (2018). https://doi.org/10.1007/s11042-017-4723-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4723-9