Elsevier

Speech Communication

Volume 38, Issues 3–4, November 2002, Pages 399-411
Speech Communication

Phoneme-group specific octave-band weights in predicting speech intelligibility

https://doi.org/10.1016/S0167-6393(02)00011-0Get rights and content

Abstract

In an earlier study we derived robust frequency-weighting functions for prediction of the intelligibility of short nonsense words. These frequency-weighting functions are applied for prediction of intelligibility such as with the speech transmission index (STI). Six independent experiments revealed essentially similar frequency-weighting functions for the prediction of the nonsense word scores with respect to signal-to-noise ratio and gender [Speech Communication 28 (1999) 109]. Although the frequency weightings do not vary significantly for signal-to-noise ratio or gender, other studies have shown that using different types of speech material (i.e., nonsense words, phonetically balanced words and connected discourse) resulted in quite different frequency-weighting functions. This may be related to the distribution of specific phonemes in the test material. In order to obtain a more generic description of the frequency weighting, four relevant groups of phonemes were identified. In situations with reduced intelligibility, a small confusion rate of the phonemes between the groups and a high confusion rate of the phonemes within each group was observed. For each group a specific frequency-weighting function and a good prediction of the phoneme group scores could be obtained. It was shown that from these (weighted) phoneme group scores, word scores could be predicted with a prediction accuracy of ca. 4% (this corresponds to a signal-to-noise ratio of about 1 dB). Hence, this method provides a more generic way to predict intelligibility scores for different types of speech material.

Introduction

Octave-band weighting functions represent the contribution of each octave band to the intelligibility of a speech signal. In an earlier study it was found that these weighting functions are robust for signal-to-noise ratio and gender of the speaker (Steeneken, 1992; Steeneken and Houtgast, 1999). However, experiments based on different types of speech material showed quite different frequency-weighting functions (French and Steinberg, 1947, Steeneken and Houtgast, 1980, Steeneken and Houtgast, 1999, Pavlovic, 1987, Studebaker et al., 1987, Duggirala et al., 1988). This may be related to the specific distribution of phonemes in the test material, since the frequency-weighting functions do vary significantly according to phonetic content. In Fig. 1 typical weighting functions are given that are derived from two standards on the objective prediction of speech intelligibility (speech transmission index, STI, described by IEC 60286-16, 1998; speech intelligibility index, SII, by ANSI S3.5, 1997) and for consonants and vowels from a study by Steeneken (1992). Fig. 1 shows a large difference between the curves for consonants and for vowels. The weighting function for the vowels has a maximum for the contributions in the 0.5 and 2 kHz octave band. Consonants and equally balanced CVC words (words of the type consonant–vowel–consonant with an equally balanced phoneme distribution) cover a wider frequency range (125 Hz–8 kHz). Obviously the octave-band contributions depend on the type of speech considered.

This is in agreement with the differences found for the effect of various types of distortions on the intelligibility of vowels and consonants. This is illustrated in Fig. 2, showing a scatter diagram of the initial consonant score versus vowel scores for male speech in 78 transmission conditions with various combinations of bandwidth and signal-to-noise ratio.

For diagnostic assessment of speech communication systems it is of interest to consider not only the overall performance derived from a specific intelligibility test (i.e., related to the speech material) but also to identify the performance for specific phonemes or groups of phonemes (Miller and Nicely, 1955). For example, the standard for the SII recommends six groups of frequency-weighting factors for prediction of different subjective intelligibility measures.

From an experiment with CVC-word tests we obtained confusions among consonants and among vowels for many different transmission conditions. For the consonants, a clustering of three groups of consonants with many intra-group confusions was found (fricatives, plosives, and vowel-like consonants). The phonemes within each of these groups show a quite similar response for various types of degradation, and confusions are mainly between phonemes within each group (Steeneken, 1992). This is given in Table 1 for 17 representative Dutch initial consonants obtained for male speech and 26 different combinations of band-pass limiting.

In the table the phonemes (SAMPA notation, 1987) that show many mutual confusions are grouped together. This results in a clustering of the plosives (p, t, k, b, d), the fricatives (f, s, v, z, x), and the vowel-like consonants (m, n, l, R, w, j, h). Some confusions are found between phonemes belonging to different clusters: f, s  p, t; v, z  w, j, h, and b  w.

A similar representation for 15 Dutch vowels derived from the same set of transmission conditions did not show a systematic clustering. Hence, for the determination of (Dutch) phoneme-specific octave-band weights, in total four clusters of phonemes are likely to be considered: fricatives, plosives, vowel-like consonants and vowels. These clusters of frequently used phonemes (>2% for Dutch language) consist of 17 initial consonants and 15 vowels. For reasons of simplicity the final consonants were not considered separately as these consonants (11) are mainly a sub-set of the initial consonants.

Section snippets

Experimental design

For the determination of the octave-band-specific frequency weighting of each phoneme group, both the phoneme scores and the related octave-band-specific signal-to-noise ratios are required for a large number of different conditions. These data can be obtained by making use of a universal communication channel of which the transfer conditions (i.e., bandwidth, additive noise type, and signal-to-noise ratio) can be adjusted. For each condition the phoneme-group-specific score and the mean

Experimental results

The experiments were based on the determination of the subjective and objective transmission quality of 78 transmission conditions for male speech and 51 transmission conditions for female speech. These transmission conditions were combinations of band-pass limiting (respectively 26 for male and 17 for female) and noise (3 signal-to-noise ratios). The subjective data included the individual phoneme-group scores and the CVC scores for male and female speakers. The objective data included the

Frequency weighting with respect to the type of speech

Several different frequency-weighting factors to predict speech intelligibility have been found in various studies. As given in Fig. 1 these are all related to different types of speech. The goal of this study is to develop a more generic model for general application. There are many differences between the studies that derived the frequency weighting functions, in particular with respect to the method by which the frequency-weighting function is derived from the subjective scores and the

Conclusions

The frequency-weighting functions used with the objective prediction of speech intelligibility depend on the type of speech material used for the development of such a method (Fig. 1). This study was focused to develop a more generic model for the objective prediction of speech intelligibility independent of the type of speech. For this purpose four phoneme groups were used (fricatives, plosives, vowel-like consonants, and vowels) for which four different sets of frequency-weighting functions

References (16)

  • H.J.M Steeneken et al.

    Mutual dependency of the octave-band weights in predicting speech intelligibility

    Speech Communication

    (1999)
  • ANSI S3.5, 1997. American National Standard, Methods for the calculation of the speech intelligibility index. Standards...
  • A.W Bronkhorst et al.

    A model for context effects in speech recognition

    J. Acoust. Soc. Amer.

    (1992)
  • V Duggirala et al.

    Frequency importance functions for a feature recognition test material

    J. Acoust. Soc. Amer.

    (1988)
  • N.R French et al.

    Factors governing the intelligibility of speech sounds

    J. Acoust. Soc. Amer.

    (1947)
  • IEC International Standard, 1998. Sound system equipment – Part 16. Objective rating of speech intelligibility by...
  • T Houtgast et al.

    A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria

    J. Acoust. Soc. Amer.

    (1985)
  • G.A Miller et al.

    An analysis of perceptual confusions among some English consonants

    J. Acoust. Soc. Amer.

    (1955)
There are more references available in the full text version of this article.

Cited by (30)

  • Audibility emphasis of low-level sounds improves consonant identification while preserving vowel identification for cochlear implant users

    2022, Speech Communication
    Citation Excerpt :

    After adjusting loudness levels, participants pressed an “Okay” button and the level specified as “Medium” was used for subsequent phoneme identification procedures. We chose 1 kHz as the comparison frequency because of its central position in predicting speech intelligibility (Steeneken and Houtgast 2002); though we note that the current international standard recommends an extended set of frequencies from 500 to 4000 Hz to characterize loudness (ISO 16832). Pure tone detection thresholds were measured for 500, 1000, 2000, and 4000 Hz tones.

  • Relationship between Chinese speech intelligibility and speech transmission index in rooms based on auralization

    2011, Speech Communication
    Citation Excerpt :

    The speech transmission index (STI) developed by Houtgast and Steeneken (1973), combines both a room acoustics and an SNR component into a single objective index. The STI measure was further improve and extended by Steeneken and Houtgast (1999, 2002a,b) with respect to mutual dependence of the octave-band weight, phoneme-group specific octave-band weights, the effect of a discontinuous frequency transfer and high signal and noise levels. So far, the STI has been suggested as the objective index of speech intelligibility in rooms by IEC 60268-16 Ed. 3.0 (2003) and has been shown to be successful for the evaluation and prediction of speech intelligibility for Western languages in rooms.

View all citing articles on Scopus
View full text