Regular Article
Discriminative feature selection for speech recognition

https://doi.org/10.1006/csla.1993.1012Get rights and content

Abstract

Over the last several years, a major factor in reducing the error rate on most speech recognition systems has been the addition of new feature components to the frame vectors. However, because of the larger dimensionality of the frame feature vector, the number of model parameters and the computational requirements have also increased. To improve recognition performance, it is not feasible to indefinitely increase the size of the frame feature vectors, nor is it satisfactory or practical to run experiments on combinatorially chosen subsets of the feature set to pick the best performing subspace with the desired number of dimensions. It becomes clearly desirable to understand which components of the frame feature vector provide the greatest contribution to recognition performance and to discard the least useful components. Our feature ordering method allows new sets of features to be selectively incorporated into existing signal analysis methods.

Discriminative analysis has been successfully used for hidden Markov model (HMM) parameter estimation. In this study, we use discriminative methods to perform feature selection of the frame feature space. The components of the feature vectors are rank ordered according to an objective criterion and only the most "significant" are used for recognition. The proposed feature reduction method has been applied to a 38 dimension vector consisting of 1st and 2nd order time derivatives of the frame energy and of the cepstral coefficients with their 1st and 2nd derivatives. Speaker independent recognition experiments with reduced feature sets were performed on three databases of high quality non-telephone speech and a data base of telephone-based speech recorded during a field trial. The dimension of the frame feature vectors, and hence the number of model parameters, were greatly reduced (by a factor of 2) without a significant loss of recognition performance.

References (0)

Cited by (27)

  • Feature selection for reduced-bandwidth distributed speech recognition

    2012, Speech Communication
    Citation Excerpt :

    In a multi-speaker isolated digit recognition task the results indicated that it is possible to reduce the feature vector size without impacting the recognition performance. Bocchieri and Wilpon (1993) use discriminative analysis to select a subset of coefficients from the feature vectors in continuous speech recognition experiments. Nicholson et al. (1997) measured the correlation between MFCC feature sets that have good class discrimination and those that produced good speech recognition results.

  • A two level strategy for audio segmentation

    2011, Digital Signal Processing: A Review Journal
  • Combining acoustic and articulatory feature information for robust speech recognition

    2002, Speech Communication
    Citation Excerpt :

    The MFCCs which were eliminated in favor of these AFs are the first derivative of the 12th cepstral coefficient and the second derivatives of the 4th, 6th, 7th, 9th, 11th and 12th cepstral coefficients. This result confirms the low relevance of delta–delta coefficients observed previously (Bocchieri and Wilpon, 1993) and the importance of the place of articulation dimension generally acknowledged in phonetics. The resulting 39-dimensional feature set was used to train another recognition system with a 256-class full-covariance codebook.

View all citing articles on Scopus
View full text