Phoneme-group specific octave-band weights in predicting speech intelligibility
Introduction
Octave-band weighting functions represent the contribution of each octave band to the intelligibility of a speech signal. In an earlier study it was found that these weighting functions are robust for signal-to-noise ratio and gender of the speaker (Steeneken, 1992; Steeneken and Houtgast, 1999). However, experiments based on different types of speech material showed quite different frequency-weighting functions (French and Steinberg, 1947, Steeneken and Houtgast, 1980, Steeneken and Houtgast, 1999, Pavlovic, 1987, Studebaker et al., 1987, Duggirala et al., 1988). This may be related to the specific distribution of phonemes in the test material, since the frequency-weighting functions do vary significantly according to phonetic content. In Fig. 1 typical weighting functions are given that are derived from two standards on the objective prediction of speech intelligibility (speech transmission index, STI, described by IEC 60286-16, 1998; speech intelligibility index, SII, by ANSI S3.5, 1997) and for consonants and vowels from a study by Steeneken (1992). Fig. 1 shows a large difference between the curves for consonants and for vowels. The weighting function for the vowels has a maximum for the contributions in the 0.5 and 2 kHz octave band. Consonants and equally balanced CVC words (words of the type consonant–vowel–consonant with an equally balanced phoneme distribution) cover a wider frequency range (125 Hz–8 kHz). Obviously the octave-band contributions depend on the type of speech considered.
This is in agreement with the differences found for the effect of various types of distortions on the intelligibility of vowels and consonants. This is illustrated in Fig. 2, showing a scatter diagram of the initial consonant score versus vowel scores for male speech in 78 transmission conditions with various combinations of bandwidth and signal-to-noise ratio.
For diagnostic assessment of speech communication systems it is of interest to consider not only the overall performance derived from a specific intelligibility test (i.e., related to the speech material) but also to identify the performance for specific phonemes or groups of phonemes (Miller and Nicely, 1955). For example, the standard for the SII recommends six groups of frequency-weighting factors for prediction of different subjective intelligibility measures.
From an experiment with CVC-word tests we obtained confusions among consonants and among vowels for many different transmission conditions. For the consonants, a clustering of three groups of consonants with many intra-group confusions was found (fricatives, plosives, and vowel-like consonants). The phonemes within each of these groups show a quite similar response for various types of degradation, and confusions are mainly between phonemes within each group (Steeneken, 1992). This is given in Table 1 for 17 representative Dutch initial consonants obtained for male speech and 26 different combinations of band-pass limiting.
In the table the phonemes (SAMPA notation, 1987) that show many mutual confusions are grouped together. This results in a clustering of the plosives (p, t, k, b, d), the fricatives (f, s, v, z, x), and the vowel-like consonants (m, n, l, R, w, j, h). Some confusions are found between phonemes belonging to different clusters: f, s → p, t; v, z → w, j, h, and b → w.
A similar representation for 15 Dutch vowels derived from the same set of transmission conditions did not show a systematic clustering. Hence, for the determination of (Dutch) phoneme-specific octave-band weights, in total four clusters of phonemes are likely to be considered: fricatives, plosives, vowel-like consonants and vowels. These clusters of frequently used phonemes (>2% for Dutch language) consist of 17 initial consonants and 15 vowels. For reasons of simplicity the final consonants were not considered separately as these consonants (11) are mainly a sub-set of the initial consonants.
Section snippets
Experimental design
For the determination of the octave-band-specific frequency weighting of each phoneme group, both the phoneme scores and the related octave-band-specific signal-to-noise ratios are required for a large number of different conditions. These data can be obtained by making use of a universal communication channel of which the transfer conditions (i.e., bandwidth, additive noise type, and signal-to-noise ratio) can be adjusted. For each condition the phoneme-group-specific score and the mean
Experimental results
The experiments were based on the determination of the subjective and objective transmission quality of 78 transmission conditions for male speech and 51 transmission conditions for female speech. These transmission conditions were combinations of band-pass limiting (respectively 26 for male and 17 for female) and noise (3 signal-to-noise ratios). The subjective data included the individual phoneme-group scores and the CVC scores for male and female speakers. The objective data included the
Frequency weighting with respect to the type of speech
Several different frequency-weighting factors to predict speech intelligibility have been found in various studies. As given in Fig. 1 these are all related to different types of speech. The goal of this study is to develop a more generic model for general application. There are many differences between the studies that derived the frequency weighting functions, in particular with respect to the method by which the frequency-weighting function is derived from the subjective scores and the
Conclusions
The frequency-weighting functions used with the objective prediction of speech intelligibility depend on the type of speech material used for the development of such a method (Fig. 1). This study was focused to develop a more generic model for the objective prediction of speech intelligibility independent of the type of speech. For this purpose four phoneme groups were used (fricatives, plosives, vowel-like consonants, and vowels) for which four different sets of frequency-weighting functions
References (16)
- et al.
Mutual dependency of the octave-band weights in predicting speech intelligibility
Speech Communication
(1999) - ANSI S3.5, 1997. American National Standard, Methods for the calculation of the speech intelligibility index. Standards...
- et al.
A model for context effects in speech recognition
J. Acoust. Soc. Amer.
(1992) - et al.
Frequency importance functions for a feature recognition test material
J. Acoust. Soc. Amer.
(1988) - et al.
Factors governing the intelligibility of speech sounds
J. Acoust. Soc. Amer.
(1947) - IEC International Standard, 1998. Sound system equipment – Part 16. Objective rating of speech intelligibility by...
- et al.
A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria
J. Acoust. Soc. Amer.
(1985) - et al.
An analysis of perceptual confusions among some English consonants
J. Acoust. Soc. Amer.
(1955)
Cited by (30)
Audibility emphasis of low-level sounds improves consonant identification while preserving vowel identification for cochlear implant users
2022, Speech CommunicationCitation Excerpt :After adjusting loudness levels, participants pressed an “Okay” button and the level specified as “Medium” was used for subsequent phoneme identification procedures. We chose 1 kHz as the comparison frequency because of its central position in predicting speech intelligibility (Steeneken and Houtgast 2002); though we note that the current international standard recommends an extended set of frequencies from 500 to 4000 Hz to characterize loudness (ISO 16832). Pure tone detection thresholds were measured for 500, 1000, 2000, and 4000 Hz tones.
Relationship between Chinese speech intelligibility and speech transmission index in rooms based on auralization
2011, Speech CommunicationCitation Excerpt :The speech transmission index (STI) developed by Houtgast and Steeneken (1973), combines both a room acoustics and an SNR component into a single objective index. The STI measure was further improve and extended by Steeneken and Houtgast (1999, 2002a,b) with respect to mutual dependence of the octave-band weight, phoneme-group specific octave-band weights, the effect of a discontinuous frequency transfer and high signal and noise levels. So far, the STI has been suggested as the objective index of speech intelligibility in rooms by IEC 60268-16 Ed. 3.0 (2003) and has been shown to be successful for the evaluation and prediction of speech intelligibility for Western languages in rooms.
Speech intelligibility from image processing
2010, Speech CommunicationRelationship between Chinese speech intelligibility and speech transmission index using diotic listening
2007, Speech CommunicationValidation of the revised STI<inf>r</inf> method
2002, Speech Communication