Similarity normalization for speaker verification by fuzzy fusion

doi:10.1016/S0031-3203(99)00042-4

Pattern Recognition

Volume 33, Issue 2, February 2000, Pages 309-315

https://doi.org/10.1016/S0031-3203(99)00042-4 Get rights and content

Abstract

Similarity or likelihood normalization techniques are important for speaker verification systems as they help to alleviate the variations in the speech signals. In the conventional normalization, the a priori probabilities of the cohort speakers are assumed to be equal. From this standpoint, we apply the theory of fuzzy measure and fuzzy integral to combine the likelihood values of the cohort speakers in which the assumption of equal a priori probabilities is relaxed. This approach replaces the conventional normalization term by the fuzzy integral which acts as a non-linear fusion of the similarity measures of an utterance assigned to the cohort speakers. We illustrate the performance of the proposed approach by testing the speaker verification system with both the conventional and the fuzzy algorithms using the commercial speech corpus TI46. The results in terms of the equal error rates show that the speaker verification system using the fuzzy integral is more flexible and more favorable than the conventional normalization method.

Introduction

Speaker verification is one of the challenging areas of speech research and has many applications including telecommunications, security systems, banking transactions, database management, forensic tasks, command and control, and others. Technically, it is one of the two tasks in speaker recognition. In other words, a speaker recognition system can be divided into two categories: speaker identification and speaker verification. A speaker identification recognizer tries to assign an unknown speaker to one of the reference speakers based on the closet measure of similarity, whereas a speaker verification recognizer is aimed to either accept or reject an unknown speaker by verifying the identity claim. Thus, the main point to distinguish between these two tasks is the number of decision alternatives. For speaker identification, the decision alternatives are equal to the number of the speakers. For speaker verification, there are only two alternatives, i.e. either accept or reject the claimed speaker. Different tasks of recognition can be used to serve different purposes. The verification systems are more appropriate for most commercial applications; whereas the identification systems are useful for the study of parametric and speech material modeling. For more details in recent developments on speaker recognition, the readers are referred to Refs. [1], [2], [3].

In speaker verification systems, the normalization techniques are important as they help to alleviate the variations in the speech signals, which are due to noise, different recording and transmission conditions [1]. There are two types of normalization techniques for speaker recognition: parameter and similarity. Some typical works in the parameter type were proposed by Atal [4], Furui [5], and in the similarity type were by Higgin et al. [6], Matsui and Furui [7]. It has also been reported that most of speaker verification systems are based on the similarity-domain normalization [8]. We therefore, in this paper, will focus our attention to the verification mode with respect to the similarity normalization.

Generally in most similarity normalization techniques, the likelihood values of the utterance coming from the cohort speakers, whose models are closest to the claimant model, are assumed to be equal likely. In reality, however, this assumption is not often true as the similarity measures between each cohort speaker and the client speaker may be different. Basing our motivation on this drawback, we introduce a new normalized log-likelihood method using the concept of fuzzy fusion. We relax the assumption of equal likelihood by imposing the fuzzy measures of the similarities between the cohort speaker models and the client model. Then the scoring of the cohort models can be obtained by the fuzzy integral which acts as a fusion operator with respect to the fuzzy measures. The rest of this paper is organized as follows. In Section 2, we present the basic formulations of the normalization techniques according to the similarity domain. In Section 3, the concepts of fuzzy measure and fuzzy integral are introduced. The fuzzy fusion for scoring the normalized log likelihood is implemented in Section 4. We compare the performance between the conventional and the proposed techniques using a commercial speech database in Section 5. Finally, Section 6 concludes the new application for speaker recognition and suggests possible development.

Section snippets

Similarity-domain normalization

Given an input set of speech feature vectors $X={x →_{1}, x →_{2}, …, x →_{N}}$ , the verification system has to decide if X was spoken by the client (for the sake of simplicity, from now on we will denote $x →$ as x). Based on the similarity domain, this can be seen as a statistical test between H₀: S and H₁: S′ where H₀ is the null hypothesis that the claimant is the client S′, while H₁ is the alternative hypothesis that the claimant is an impostor S′. The decision according to the Bayesian rule for minimum risk

Fuzzy measure and fuzzy integral

Stemming from the concept of fuzzy sets by Zadeh [10], the theory of fuzzy measures and fuzzy integrals were first introduced by Sugeno [11]. Fuzzy measures are used as subjective scales for grades of fuzziness that can be expressed as “grade of importance”, or “grade of closeness”, etc. In mathematical terms, a fuzzy measure is a set function with monotonicity but not always additivity. Based on the notion of a fuzzy measure, a fuzzy integral is a functional with monotonicity which is used for

Fuzzy-fusion based normalization

It has been mentioned in the foregoing sections that the a priori probability of an utterance given that it is from one of the cohort speakers is assumed to be equal in the conventional similarity normalization methods, we use the concept of the fuzzy measure to calculate the grades of similarity or closeness between each cohort speaker model and the client model, i.e. the fuzzy density, and the multi-attributes of these fuzzy densities. The final score for the normalization of the cohort

Measure of performance

One of the most common performance measures for speaker verification systems is the equal error rate (EER) which applies an a posteriori threshold to make the false acceptance error rate equal to the false rejection error rate. If the score of an identity claim is above a certain threshold then it is verified as the true speaker, otherwise the claim is rejected. If the threshold is set high then there is a risk of rejecting a true speaker. On the contrary, if the threshold is set low then there

Conclusions

A fusion algorithm based on the fuzzy integral has been proposed and implemented in the similarity normalization for speaker verification. Then the experimental results show that the application of the proposed method is superior to that of the conventional normalization. The key difference between the two methods is that the assumption of equal a priori probabilities is not necessary for the fuzzy integral-based normalization due to the concept of the fuzzy measure. In fact, applications of

References (27)

A.L. Higgins et al.
Speaker verification using randomnized phrase prompting
Digital Signal Processing
(1991)
L.A. Zadeh
Fuzzy sets, Inform. and Controls
(1965)
G. Banon
Distinction between several subsets of fuzzy measures
Fuzzy Sets and Systems
(1981)
K. Leszczynski et al.
Sugeno's fuzzy measure and fuzzy clustering
Fuzzy Sets and Systems
(1985)
T. Murofushi et al.
An interpretation of fuzzy measure and the Choquet integral as an integral with respect to a fuzzy measure
Fuzzy Sets and Systems
(1989)
T.D. Pham et al.
A kriging fuzzy integral
Inform. Sci.
(1997)
J.M. Keller et al.
Advances in fuzzy integration for pattern recognition
Fuzzy Sets and Systems
(1994)
M. Grabisch et al.
Classification by fuzzy integralperformance and tests
Fuzzy Sets and Systems
(1994)
M. Grabisch
The representation of importance and interaction of features by fuzzy measures
Pattern Recognition Lett.
(1996)
S. Furui, An overview of speaker recognition technology, Proceedings of Workshop on Automatic Speaker Recognition,...

J.P. Campbell

Speaker recognition: a tutorial

Proc. IEEE

(1997)

G.R. Doddington, Speaker recognition evaluation methodology – an overview and perspective, Proceedings of Workshop on...

B.S. Atal

Effective of linear prediction characteristics of speech wave for automatic speaker identification and verification

J. Acoust. Soc. Am.

(1974)

Cited by (13)

An efficient open system for offline handwritten signature identification based on curvelet transform and one-class principal component analysis
2017, Neurocomputing
Citation Excerpt :
Its main advantage is to measure the strength not separately for each classifier alone but for all members. The ability of the fuzzy integral to enhance the results produced by multiple information sources has been highlighted in various application areas of pattern recognition [35–40]. Thus, the contribution of the Choquet Fuzzy Integral (C-FI) is investigated in this paper in order to achieve a robust signature identification system.
Offline handwritten signature identification has received less attention in comparison with the offline signature verification, despite its crucial applications such as in law-enforcements, automatic bank check and historical documents processing. In this paper, an Open Handwritten Signature Identification System (OHSIS) is proposed by using conjointly the Curvelet Transform (CT) and the One-Class classifier based on Principal Component Analysis (OC-PCA). CT is explored for feature generation due to its efficient characterization of curves contained into the local orientations within the signature image. While, OC-PCA is used for its effectiveness to absorb the high feature size generated by the CT and allows achieving at the same time an open system. Then, in order to improve the robustness of the OHSIS when few reference signatures are available, a new combination approach based on Choquet fuzzy integral is proposed to combine multiple individual OHSISs. Furthermore, a designing protocol with limited number of writers and reference signatures is proposed to perform a parameter-independent OHSIS. Experimental results conducted on standard CEDAR and GPDS handwritten signature datasets report 97.99% and 94.96% correct identification rate, respectively, which highlights the effectiveness of the proposed OHSIS since it can comfortably outperform the state-of-the-art when using few reference signatures.
An image restoration by fusion
2001, Pattern Recognition
To deal with the problem of restoring images degraded with Gaussian white noise, the mean and adaptive Wiener filters are the most common methods to be implemented. Although these methods are both lowpass in character, they yield different results on the same problem. The mean filter reduces more noise than the adaptive Wiener but also blurs the image edges, whereas the adaptive Wiener filter can preserve edge sharpness but reduces less noise than the mean filter. Instead of trying to design a single mathematical technique to have the advantages of both methods, which is usually theoretically difficult, we propose an alternative solution to this image restoration by fusing multiple image filters using the mean, Sobel, and adaptive Wiener filters. Performance of the fusion algorithm is based on both redundant and complementary information provided by different filters. Several experimental results show the effective application of the proposed approach.
Hybrid one-class classifier ensemble based on fuzzy integral for open-lexicon handwritten Arabic word recognition
2019, Pattern Analysis and Applications
Segmentation-verification based on fuzzy integral for connected handwritten digit recognition
2015, 5th International Conference on Image Processing, Theory, Tools and Applications 2015, IPTA 2015
Biometric score fusion in identification model using the Choquet integral
2015, Proceedings of 2015 International Conference on Electrical and Information Technologies, ICEIT 2015
Applying the upper integral to the biometric score fusion problem in the identification model
2015, Information (Switzerland)

View all citing articles on Scopus

About the Author—TUAN D. PHAM received the B.E. degree (1990) in Civil Engineering from the University of Wollongong, the Ph.D. degree (1995) in Civil Engineering, with a thesis on fuzzy-set modeling in the finite element analysis of engineering problems, from the University of New South Wales. From 1994 to 1995, he was a senior systems analyst with Engineering Computer Services Ltd, and from 1996 to early 1997 he was a post-doctoral fellow with the Laboratory for Imaging Science and Engineering in the Department of Electrical Engineering at the University of Sydney. From 1997 to 1998 he held a research fellow position with the Laboratory for Human-Computer Communication in the Faculty of Information Sciences and Engineering at the University of Canberra, and he is now a lecturer in the School of Computing in the same Faculty. He is a co-author of 2 monographs, author and co-author of over 40 technical papers published in popular journals and conferences. His main research interests include the applications of computational intelligence and statistical techniques to pattern recognition, particularly in image processing, speech and speaker recognition. Dr. Pham is a member of the IEEE.

About the Author—MICHAEL WAGNER received a Diplomphysiker degree from the University of Munich in 1973 and a PhD in Computer Science from the Australian National University in 1979 with a thesis on learning networks for speaker recognition. Dr Wagner has been involved in speech and speaker recognition research since and has held research and teaching positions at the Technical University of Munich, National University of Singapore, University of Wollongong, University of New South Wales and the Australian National University. He was the Foundation President of the Australian Speech Science and Technology Association from 1986 to 1992 and is currently a professor and head of the School of Computing at the University of Canberra. Dr. Michael Wagner is a fellow of IEAust and a member of ASSTA, ESCA and IEEE.

View full text

Similarity normalization for speaker verification by fuzzy fusion

Abstract

Introduction

Section snippets

Similarity-domain normalization

Fuzzy measure and fuzzy integral

Fuzzy-fusion based normalization

Measure of performance

Conclusions

Digital Signal Processing

Fuzzy sets, Inform. and Controls

Fuzzy Sets and Systems

Fuzzy Sets and Systems

Fuzzy Sets and Systems

Inform. Sci.

Fuzzy Sets and Systems

Fuzzy Sets and Systems

Pattern Recognition Lett.

Speaker recognition: a tutorial

Proc. IEEE

Effective of linear prediction characteristics of speech wave for automatic speaker identification and verification

J. Acoust. Soc. Am.