Elsevier

Neurocomputing

Volume 20, Issues 1–3, 31 August 1998, Pages 227-252
Neurocomputing

A method of combining multiple probabilistic classifiers through soft competition on different feature sets

https://doi.org/10.1016/S0925-2312(98)00019-8Get rights and content

Abstract

A novel method is proposed for combining multiple probabilistic classifiers on different feature sets. In order to achieve the improved classification performance, a generalized finite mixture model is proposed as a linear combination scheme and implemented based on radial basis function networks. In the linear combination scheme, soft competition on different feature sets is adopted as an automatic feature rank mechanism so that different feature sets can be always simultaneously used in an optimal way to determine linear combination weights. For training the linear combination scheme, a learning algorithm is developed based on Expectation–Maximization (EM) algorithm. The proposed method has been applied to a typical real-world problem, viz., speaker identification, in which different feature sets often need consideration simultaneously for robustness. Simulation results show that the proposed method yields good performance in speaker identification.

Introduction

The problem of pattern classification can be stated as follows: Given a set of training data, each with an associated label, find a classification system that will produce the correct label for any data drawn from the same source as the training data. As illustrated in Fig. 1a, in general, such a classification system is composed of three stages: preprocessing, feature extraction, and classification. In particular, feature extraction is necessary to avoid a so-called curse of dimensionality problem [18] which may lead to prohibitively expensive computation in the stage of classification. Therefore, the performance of a classification system highly depends upon a feature set used. For a classification task, numerous types of features can be extracted from the same raw data by means of different methods. A selection technique is often adopted to find an optimal feature set for use in classification [6]. Sometimes, however, it is impossible to find such an optimal feature set. Instead several different feature sets can result in similar classification performance so that none of them can be optimal or robust for a specific classification task. Because different feature sets represent raw data from different viewpoints, the simultaneous use of different feature sets can lead to a better or robust classification result. As illustrated in Fig. 1b, this kind of problems are called pattern classification on different feature sets in this paper. There are many real-world problems belonging to this category. A typical example is speaker identification that classifies an unlabeled voice token as belonging to one of reference speakers. For this problem, several different spectrum feature sets have been turned out useful, but none of them can be regarded as an optimal or robust one. It has been suggested that multiple spectrum feature sets need consideration simultaneously for robustness in speaker identification 20, 21. As a result, a technique that efficiently utilizes different feature sets becomes a solution to pattern classification on different feature sets.

For simultaneous use of different feature sets, a traditional method is to lump different feature vectors together into a single composite feature vector. Although there are several methods to form a composite feature vector, the use of a composite feature set may result in the following problems: (1) Curse of dimensionality; the dimension of a composite feature vector becomes much higher than any of component feature vectors. (2) Difficulty in formation; it is often difficult to lump several different feature vectors together due to their diversified forms. (3) Redundancy; the component feature vectors are usually not independent of each other. Therefore, the composite-feature based method can simply achieve limited success. Recently, combination of multiple classifiers has been viewed as a new direction for the development of highly reliable pattern recognition systems. Preliminary results indicate that combination of several complementary classifiers may lead to the improved performance 26, 25, 3, 38, 33, 32, 35, 23, 46. There are at least two reasons for necessity of combining multiple classifiers. On the one hand, there are a number of classification algorithms developed from different theories and methodologies in almost all the current pattern recognition application areas. Each of these classifiers may reach a certain degree of success, but none of them is totally perfect or so good as expected in practical applications. On the other hand, the demand for a solution to pattern classification on different feature sets becomes the other reason to do so. A combination technique allows multiple classifiers to work on different feature sets so that different feature sets can be utilized simultaneously. Therefore, a method of combining multiple classifiers on different feature sets provides an alternative way to solve the problem of pattern classification on different feature sets.

From the viewpoint of statistics, combination of multiple classifiers can be viewed as combination of multiple probability distributions if each classifier is interpreted as an estimator of probability distribution. In general, there are two frameworks to perform such a combination [27]. One is that of a decision maker who consults several classifiers regarding some events. The classifiers express their opinions in the form of probability distributions. The decision maker must aggregate the classifiers’ distributions into a single distribution that can be used to make the final decision. The other is the framework of linear opinion pools in which the decision maker forms a linear combination of classifiers’ opinions. Under the two frameworks, there have been extensive studies in combination of multiple probabilistic classifiers 27, 22, 44, 46, 10, 1. In terms of pattern classification on different feature sets, combination techniques under the framework of a decision maker, e.g. a supra Bayesian procedure 27, 46, can be directly used for this problem since outputs of the classifiers are merely considered for combination regardless of their inputs. On the other hand, several combination techniques under the framework of linear opinion pools can be also directly applied for combination of multiple classifiers on different feature sets 33, 32, 35, 23, 7, 43. These techniques are based on constrained or unconstrained least-squares regression with model selection to make a system achieve good generalization properties. However, the existing techniques of linear opinion pools with weights as veridical probabilities 27, 45, 44 cannot be used to deal with a problem of pattern classification on different feature sets unless a single composite feature set is used since the process of generating weights used for linear combination highly depends upon inputs of multiple classifiers.

In this paper, we propose a new linear combination scheme to extend the existing techniques of linear opinion pools with weights as veridical probabilities for pattern classification on different feature sets. In contrast to the winner-take-all mechanism, soft competition is a concept that a competitor and its rivals can work for the same task together, but the winner plays a more important role than losers. In the linear combination scheme, we adopt such a soft competition mechanism on different feature sets to determine weights in an optimal way for linear combination. An EM learning algorithm is also proposed for parameter estimation in the linear combination scheme. To demonstrate its effectiveness, we have applied the proposed method to a real-world problem, viz., speaker identification, in which diversified feature sets need consideration simultaneously for robustness. Simulation results show that the proposed method yields satisfactory performance in speaker identification.

The remainder of this paper is organized as follows. Section 2presents the methodology on the linear combination scheme through soft competition. Section 3describes an EM learning algorithm for parameter estimation in the linear combination scheme. Section 4reports simulation results on speaker identification. Conclusions are drawn in the final section.

Section snippets

Methodology

In this section, we first give the basic idea underlying the proposed linear combination scheme. Then we present a generalized finite mixture model as the linear combination scheme and give its implementation based on radial basis function networks.

EM Algorithm for maximum-likelihood learning

In this section, we present a maximum-likelihood learning method for parameter estimation in the linear combination scheme on the basis of Expectation–Maximization (EM) algorithm [16].

To present the EM algorithm, we assume that N classifiers have been already trained on K (KN) different feature sets extracted from the same training set, Given a cross-validation set, X={(D(t),y(t))}t=1T,K different feature sets, {x1(t)}t=1T,…,{xK(t)}t=1T, can be extracted from the input data set {D(t)}t=1T with

Simulations

In this section, we demonstrate the effectiveness of the linear combination scheme as illustrated in Fig. 2. In order to evaluate its performance, we have applied our method to a real world problem called speaker identification.

Speaker identification is to classify an unlabeled voice token as belonging to one of a set of registered reference speakers. A speaker identification system can be either text-dependent or text-independent. By text-dependent, we mean that the text in both training and

Concluding remarks

We have presented a novel method of combining multiple probabilistic classifiers on different feature sets under the framework of linear opinion pools. In the proposed method, soft competition is adopted as an automatic feature rank mechanism for use of different feature sets in an optimal way. Based on the soft competition mechanism, a generalized finite mixture model is proposed as a linear combination scheme and an EM learning algorithm is also developed for parameter estimation in the

Acknowledgements

Authors would like to thank L. Xu for a valuable discussion, L. Wang for her help in simulations, and two anonymous referees for their constructive comments that significantly improved the presentation of this paper. This work was partially supported by National Science Foundation in China.

Ke Chen received his B.S. and M.S. in computer science from Nanjing University in 1984 and 1987, respectively, as well as his Ph.D. in computer science and engineering from Harbin Institute of Technology in 1990. From 1990 to 1992, he was a postdoctoral researcher at Tsinghua University. During 1992-1993 he was a postdoctoral fellow of Japan Society for Promotion of Sciences and worked at Kyushu Institute of Technology. From 1997 to 1998, he was a research scientist at The Ohio State

References (46)

  • C. Agnew

    Multiple probability assessments by dependent experts

    J. Amer. Statist. Assoc.

    (1985)
  • B.S. Atal

    Effectiveness of linear prediction characteristics of the speech waves for automatic speaker identification and verification

    J. Acoust. Soc. Amer.

    (1974)
  • R. Battiti et al.

    Democracy in neural nets: voting schemes for classification

    Neural Networks

    (1994)
  • Y. Bennani

    A modular and hybrid connectionist system for speaker identification

    Neural Comput.

    (1995)
  • Y. Bennani, P. Gallinari, Connectionist approaches for automatic speaker recognition, Proc. ESCA Workshop on Automatic...
  • A. Blum

    Selection of relevant features and examples in machine learning

    Artif. Intell.

    (1997)
  • L. Breiman, Stacked regression, Tech. Rep. TR-367, Department of Statistics, University of California, Berkeley,...
  • L. Breiman

    Bagging predictors

    Mach. Learning

    (1996)
  • F.P. Campbell

    Speaker recognition: a tutorial

    Proc. IEEE

    (1997)
  • S. Chatterjee et al.

    On combining expert opinions

    Amer. J. Math. Management Sci.

    (1987)
  • K. Chen et al.

    Methods of combining multiple classifiers with different features and their applications to text-indpendent speaker identification

    Int. J. Pattern Recognition Artif. Intell.

    (1997)
  • K. Chen, D. Xie, H. Chi, Speaker identification based on hierarchical mixture of experts, Proc. World Congress on...
  • K. Chen et al.

    A modified HME architecture for text-dependent speaker identification

    IEEE Trans. Neural Networks

    (1996)
  • K. Chen et al.

    Speaker Identification using time-delay HMEs

    Int. J. Neural Systems

    (1996)
  • K. Chen, L. Xu, H. Chi, Improved learning algorithms for mixtures of experts in multiway classification, Neural...
  • A.P. Dempster et al.

    Maximum-likelihood from incomplete data via the EM algorithm

    J. Roy. Statist. Soc. B

    (1977)
  • G. Doddington

    Speaker recognition – identifying people by their voice

    Proc. IEEE

    (1986)
  • R. Duda et al.

    Pattern Classification and Scene Analysis

    (1973)
  • Y. Freund, R. Schapire, Experiments with a new boosting algorithm, Proc. 13th Int. Conf. Machine Learning, 1996, pp....
  • S. Furui, An overview of speaker recognition technology, Proc. ESCA Workshop on Automatic Speaker Recognition,...
  • S. Furui

    Recent advances in speaker recognition

    Pattern Recognition Lett.

    (1997)
  • A. Gelfand et al.

    Modeling expert opinion arising as a partial probabilistic specification

    J. Amer. Statist. Assoc.

    (1995)
  • S. Hashem, Optimal linear combinations of neural networks, Tech. Rep. SMS 94-4, School of Industrial Engineering,...
  • Cited by (0)

    1. Download : Download full-size image
    Ke Chen received his B.S. and M.S. in computer science from Nanjing University in 1984 and 1987, respectively, as well as his Ph.D. in computer science and engineering from Harbin Institute of Technology in 1990. From 1990 to 1992, he was a postdoctoral researcher at Tsinghua University. During 1992-1993 he was a postdoctoral fellow of Japan Society for Promotion of Sciences and worked at Kyushu Institute of Technology. From 1997 to 1998, he was a research scientist at The Ohio State University. He joined Peking University in 1993, and is now a full professor of Information Science. He has published over 50 technical papers in refereed journals and international conferences. His current research interest includes neural computation and its applications in machine perception. Dr. Chen is a member of IEEE and a senior member of CIE.

    1. Download : Download full-size image
    Huisheng Chi graduated from department of radio and electronics at Peking University in 1964 (six-year system) and has been working in the university since then. Major research interests are in satellite communications, digital communications and speech signal processing. In recent years, the research projects conducted by him involved in the neural network auditory model and speaker identification systems. He received Neural Network Leadership Award in 1994 and 1995 from INNS. he is a deputy president of Peking University, where he is also a full professor of information science. He serves as an associate editor of IEEE Transactions on neural networks. Prof. Chi is a senior member of IEEE, a member of appraisal group of NSFC and SEC, fellow of CIE and CIC, and a vice chairman of CNNC and CAGIS.

    View full text