Elsevier

Pattern Recognition

Volume 33, Issue 2, February 2000, Pages 309-315
Pattern Recognition

Similarity normalization for speaker verification by fuzzy fusion

https://doi.org/10.1016/S0031-3203(99)00042-4Get rights and content

Abstract

Similarity or likelihood normalization techniques are important for speaker verification systems as they help to alleviate the variations in the speech signals. In the conventional normalization, the a priori probabilities of the cohort speakers are assumed to be equal. From this standpoint, we apply the theory of fuzzy measure and fuzzy integral to combine the likelihood values of the cohort speakers in which the assumption of equal a priori probabilities is relaxed. This approach replaces the conventional normalization term by the fuzzy integral which acts as a non-linear fusion of the similarity measures of an utterance assigned to the cohort speakers. We illustrate the performance of the proposed approach by testing the speaker verification system with both the conventional and the fuzzy algorithms using the commercial speech corpus TI46. The results in terms of the equal error rates show that the speaker verification system using the fuzzy integral is more flexible and more favorable than the conventional normalization method.

Introduction

Speaker verification is one of the challenging areas of speech research and has many applications including telecommunications, security systems, banking transactions, database management, forensic tasks, command and control, and others. Technically, it is one of the two tasks in speaker recognition. In other words, a speaker recognition system can be divided into two categories: speaker identification and speaker verification. A speaker identification recognizer tries to assign an unknown speaker to one of the reference speakers based on the closet measure of similarity, whereas a speaker verification recognizer is aimed to either accept or reject an unknown speaker by verifying the identity claim. Thus, the main point to distinguish between these two tasks is the number of decision alternatives. For speaker identification, the decision alternatives are equal to the number of the speakers. For speaker verification, there are only two alternatives, i.e. either accept or reject the claimed speaker. Different tasks of recognition can be used to serve different purposes. The verification systems are more appropriate for most commercial applications; whereas the identification systems are useful for the study of parametric and speech material modeling. For more details in recent developments on speaker recognition, the readers are referred to Refs. [1], [2], [3].

In speaker verification systems, the normalization techniques are important as they help to alleviate the variations in the speech signals, which are due to noise, different recording and transmission conditions [1]. There are two types of normalization techniques for speaker recognition: parameter and similarity. Some typical works in the parameter type were proposed by Atal [4], Furui [5], and in the similarity type were by Higgin et al. [6], Matsui and Furui [7]. It has also been reported that most of speaker verification systems are based on the similarity-domain normalization [8]. We therefore, in this paper, will focus our attention to the verification mode with respect to the similarity normalization.

Generally in most similarity normalization techniques, the likelihood values of the utterance coming from the cohort speakers, whose models are closest to the claimant model, are assumed to be equal likely. In reality, however, this assumption is not often true as the similarity measures between each cohort speaker and the client speaker may be different. Basing our motivation on this drawback, we introduce a new normalized log-likelihood method using the concept of fuzzy fusion. We relax the assumption of equal likelihood by imposing the fuzzy measures of the similarities between the cohort speaker models and the client model. Then the scoring of the cohort models can be obtained by the fuzzy integral which acts as a fusion operator with respect to the fuzzy measures. The rest of this paper is organized as follows. In Section 2, we present the basic formulations of the normalization techniques according to the similarity domain. In Section 3, the concepts of fuzzy measure and fuzzy integral are introduced. The fuzzy fusion for scoring the normalized log likelihood is implemented in Section 4. We compare the performance between the conventional and the proposed techniques using a commercial speech database in Section 5. Finally, Section 6 concludes the new application for speaker recognition and suggests possible development.

Section snippets

Similarity-domain normalization

Given an input set of speech feature vectors X={x1,x2,…,xN}, the verification system has to decide if X was spoken by the client (for the sake of simplicity, from now on we will denote x as x). Based on the similarity domain, this can be seen as a statistical test between H0: S and H1: S′ where H0 is the null hypothesis that the claimant is the client S′, while H1 is the alternative hypothesis that the claimant is an impostor S′. The decision according to the Bayesian rule for minimum risk

Fuzzy measure and fuzzy integral

Stemming from the concept of fuzzy sets by Zadeh [10], the theory of fuzzy measures and fuzzy integrals were first introduced by Sugeno [11]. Fuzzy measures are used as subjective scales for grades of fuzziness that can be expressed as “grade of importance”, or “grade of closeness”, etc. In mathematical terms, a fuzzy measure is a set function with monotonicity but not always additivity. Based on the notion of a fuzzy measure, a fuzzy integral is a functional with monotonicity which is used for

Fuzzy-fusion based normalization

It has been mentioned in the foregoing sections that the a priori probability of an utterance given that it is from one of the cohort speakers is assumed to be equal in the conventional similarity normalization methods, we use the concept of the fuzzy measure to calculate the grades of similarity or closeness between each cohort speaker model and the client model, i.e. the fuzzy density, and the multi-attributes of these fuzzy densities. The final score for the normalization of the cohort

Measure of performance

One of the most common performance measures for speaker verification systems is the equal error rate (EER) which applies an a posteriori threshold to make the false acceptance error rate equal to the false rejection error rate. If the score of an identity claim is above a certain threshold then it is verified as the true speaker, otherwise the claim is rejected. If the threshold is set high then there is a risk of rejecting a true speaker. On the contrary, if the threshold is set low then there

Conclusions

A fusion algorithm based on the fuzzy integral has been proposed and implemented in the similarity normalization for speaker verification. Then the experimental results show that the application of the proposed method is superior to that of the conventional normalization. The key difference between the two methods is that the assumption of equal a priori probabilities is not necessary for the fuzzy integral-based normalization due to the concept of the fuzzy measure. In fact, applications of

About the Author—TUAN D. PHAM received the B.E. degree (1990) in Civil Engineering from the University of Wollongong, the Ph.D. degree (1995) in Civil Engineering, with a thesis on fuzzy-set modeling in the finite element analysis of engineering problems, from the University of New South Wales. From 1994 to 1995, he was a senior systems analyst with Engineering Computer Services Ltd, and from 1996 to early 1997 he was a post-doctoral fellow with the Laboratory for Imaging Science and

References (27)

  • J.P. Campbell

    Speaker recognition: a tutorial

    Proc. IEEE

    (1997)
  • G.R. Doddington, Speaker recognition evaluation methodology – an overview and perspective, Proceedings of Workshop on...
  • B.S. Atal

    Effective of linear prediction characteristics of speech wave for automatic speaker identification and verification

    J. Acoust. Soc. Am.

    (1974)
  • Cited by (13)

    • An efficient open system for offline handwritten signature identification based on curvelet transform and one-class principal component analysis

      2017, Neurocomputing
      Citation Excerpt :

      Its main advantage is to measure the strength not separately for each classifier alone but for all members. The ability of the fuzzy integral to enhance the results produced by multiple information sources has been highlighted in various application areas of pattern recognition [35–40]. Thus, the contribution of the Choquet Fuzzy Integral (C-FI) is investigated in this paper in order to achieve a robust signature identification system.

    • An image restoration by fusion

      2001, Pattern Recognition
    • Segmentation-verification based on fuzzy integral for connected handwritten digit recognition

      2015, 5th International Conference on Image Processing, Theory, Tools and Applications 2015, IPTA 2015
    • Biometric score fusion in identification model using the Choquet integral

      2015, Proceedings of 2015 International Conference on Electrical and Information Technologies, ICEIT 2015
    View all citing articles on Scopus

    About the Author—TUAN D. PHAM received the B.E. degree (1990) in Civil Engineering from the University of Wollongong, the Ph.D. degree (1995) in Civil Engineering, with a thesis on fuzzy-set modeling in the finite element analysis of engineering problems, from the University of New South Wales. From 1994 to 1995, he was a senior systems analyst with Engineering Computer Services Ltd, and from 1996 to early 1997 he was a post-doctoral fellow with the Laboratory for Imaging Science and Engineering in the Department of Electrical Engineering at the University of Sydney. From 1997 to 1998 he held a research fellow position with the Laboratory for Human-Computer Communication in the Faculty of Information Sciences and Engineering at the University of Canberra, and he is now a lecturer in the School of Computing in the same Faculty. He is a co-author of 2 monographs, author and co-author of over 40 technical papers published in popular journals and conferences. His main research interests include the applications of computational intelligence and statistical techniques to pattern recognition, particularly in image processing, speech and speaker recognition. Dr. Pham is a member of the IEEE.

    About the Author—MICHAEL WAGNER received a Diplomphysiker degree from the University of Munich in 1973 and a PhD in Computer Science from the Australian National University in 1979 with a thesis on learning networks for speaker recognition. Dr Wagner has been involved in speech and speaker recognition research since and has held research and teaching positions at the Technical University of Munich, National University of Singapore, University of Wollongong, University of New South Wales and the Australian National University. He was the Foundation President of the Australian Speech Science and Technology Association from 1986 to 1992 and is currently a professor and head of the School of Computing at the University of Canberra. Dr. Michael Wagner is a fellow of IEAust and a member of ASSTA, ESCA and IEEE.

    View full text