Elsevier

Speech Communication

Volume 105, December 2018, Pages 1-11
Speech Communication

A new verification of the speech transmission index for the English language

https://doi.org/10.1016/j.specom.2018.10.005Get rights and content

Abstract

The speech transmission index (STI) is one of the most widely used and standardized methods for objective prediction of speech intelligibility of transmission channels. The original verification of the relationship between the STI and the intelligibility for the English language was published in 1987. The methodology employed then for the listening tests and the different input spectrum recommended today by the current STI method suggest that the relationship STI vs. speech intelligibility needs to be verified for the English language. This paper presents a new verification of the current STI for the English language with binaural listening and the speech materials presented to the listeners in a real room from a real sound source. Two hundred and ten subjects participated in speech intelligibility tests designed to replicate real-life listening conditions. The speech materials were contaminated with natural reverberation, noise, band pass limiting and echoes, and the listeners were not familiarized with the speech materials before the tests, in order to better replicate everyday situations. Results showed lower intelligibility scores than the earlier verification presented in Annex E of the current STI standard (IEC60268–16:2011) for most of the scenarios that were investigated. The correlation between STI and speech intelligibility was also investigated for the English language with a newly proposed male speech spectrum. The accuracy of STI intelligibility prediction with the new male spectrum was found higher than that attained with the current IEC specified male spectrum. These new findings give new and additional insights into the interpretation of intelligibility using the STI values.

Introduction

The speech transmission index (STI) (Steeneken and Houtgast, 1980) is one of the most widely used and standardized methods for objective prediction of intelligibility of speech transmission channels. Since its introduction in the 1970 s, the STI method has undergone continuous evolution and improvements, which have been reflected in successive revisions of the IEC60268–16 standard “Objective rating of speech intelligibility by speech transmission index”.

Extensive work has been undertaken to verify the correlation of the speech transmission index with speech intelligibility by comparing STI results with subjectively determined intelligibility scores. The relationship between speech intelligibility and the STI was investigated for the Dutch language in 1999 and in 2002 (Steeneken and Houtgast, 1999, Steeneken and Houtgast, 2002) and also for the English language in 1987 (Anderson and Kalb, 1987); both relationships are included in an Informative Annex of the current STI standard IEC60268–16:2011 (IEC, 2011). For specific STI values, substantial differences in the intelligibility scores can be found between the two relationships (IEC, 2011), which are primarily due to the different test methodologies and speech materials employed: Harvard phonetically balanced (PB) word lists were used for the English verification (Anderson and Kalb, 1987) and nonsense consonant-vowel-consonant (CVC) words were used for the Dutch validation (Steeneken and Houtgast, 2002). Intelligibility scores with nonsense words are expected to be lower than those with actual words (Egan, 1948). The word “verification” was employed for the English language based work and the word “validation” was employed for the Dutch language works.

In addition to the validations included in IEC60268–16:2011 (IEC, 2011), a simplified version of the STI (RASTI) was also verified for eleven languages (Houtgast and Steeneken, 1984). Relationships between speech intelligibility and the STI have also been investigated for the English and Chinese language (Kang, 1998) and more recently, intelligibility for several languages were also compared for several STI values (Galbrun and Kitapci, 2016). The most recent validation of the STI included in the IEC standard (IEC, 2011), was published for the Dutch language in 2002 (Steeneken and Houtgast, 2002). In this validation, higher speech intelligibility scores were found for female talkers than those for male takers.

In accordance with these findings, the current STI method included in the latest version of IEC60268–16:2011 (IEC, 2011) has adopted the male speech spectrum in order to consider the worse case scenario for speech intelligibility. An STI speech intelligibility prediction with female speech spectrum can also be performed with the current STI method, but the use of a female spectrum must be reported.

In the 1987 STI verification for the English language (Anderson and Kalb, 1987), female and male spectra were averaged for the STI calculations. The computation of the STI values apparently did not consider the current male speech spectrum based octave-band weightings and redundancy factors that are now utilised in STI calculations. Despite these discrepancies, the correlation between PB-word scores and STI from Anderson and Kalb is still presented in Figure E1 of Informative Annex E in the 2011 STI standard (IEC, 2011) unamended. It is therefore useful for a new verification to give insights into how the STI scores attained following the current STI method correlate with subjectively determined intelligibility.

Anderson and Kalb (AK87) (Anderson and Kalb, 1987) verified the STI for the English language with Harvard phonetically balanced (PB) word lists (Egan, 1948). The PB corpus comprises 20 lists of 50 phonetically balanced monosyllabic English words. In the PB word lists, the monosyllabic words have been selected so that they approximate the relative frequency of phoneme occurrence in the English language. In the AK87 verification, the PB word lists were originally recorded with a single microphone by three male and two female speakers. Electronic reverberation, noise and band pass limiting were employed to contaminate the original speech recordings. In the AK87 intelligibility tests, the contaminated word lists were presented to three male and three female listeners through headphones in a monaural format (diotic presentation). The listeners were familiarized with all the words so that under the best listening conditions, they were consistently able to identify a minimum of 95% of the words. Similar results were found between the intelligibility scores resulting from male or female talkers. The STI calculations employed an input spectrum obtained from averaging male and female spectra.

The effects of training of listeners on intelligibility test results have been widely investigated using monosyllabic words (Egan, 1948, Moser and Dreher, 1955, Schwab et al., 1985, Burk et al., 2006). All these works found increases between 10% and 70% of the intelligibility scores due to training. Recently, Kondo (2012) investigated the effect of word familiarity on speech intelligibility by employing monosyllabic words (Diagnostic Rhyme Tests). For quiet conditions, a difference of approximately 3–4% in the intelligibility scores was found between high familiarity and low familiarity words. A difference of approximately 10% was also reported between familiar and non-familiar words for conditions of 10 dB SNR, and exceeded 20% for SNRs of 0 dB or less. Although these studies employed different metrics to assess the effects of training, they suggest that both procedural learning and familiarity can each lead to an increase in word intelligibility score of up to 10% (20% in total), the effects materializing over several sessions. They also indicate that word familiarity can lead to an increase in intelligibility at least as great as that due to procedural learning. These learning effects were found to be higher for noisy conditions than for uncontaminated speech.

Additionally, the validations provided for the English language employed subjects listening to monaural speech presented through headphones. This could have led to mismatches in correlating the STI with measured speech intelligibility scores, due to potential limitations of headphones reproduction, which have been reported by Vasiliauskas et al. (2010). Kondo (2012) also showed small differences between intelligibility obtained with real sources in real space and those obtained with binaural recordings presented to the listeners through headphones. More recently, Morales et al. (2014) found statistically notable differences between intelligibility scores obtained in “real-life” (real room and with a real sound source) and binaural recordings presented to the listeners through headphones.

An extended version of the STI which included binaural hearing in its calculations was proposed by Wijngaarden, and Drullman (2008). This STI model indicated good prediction accuracy for a variety of conditions including reverberation and noise, and proposed the use of a better-ear binaural STI which is now recommended in IEC60268–16:2011. Both monaural and better-ear binaural STI calculations were employed in this study.

The goal of the present paper is to present a verification of the current revised STI method for the English language with a real-life binaural listening method. This method was thought to replicate a real-world listening scenario with the listeners present in a real room, and within the real sound field at the time of the intelligibility tests. Two hundred and ten subjects listened to speech contaminated with reverberation, noise, band pass limiting and echoes.

Section snippets

STI measurements and listening tests

STI measurements and intelligibility tests were mostly carried out in a room with reverberation that could be varied by introducing different amounts of absorption. An anechoic room was also employed for several scenarios. The intelligibility tests employed Harvard PB word lists and the listeners were unfamiliar with the words.

Results

Fig. 3 shows the relationship between the PB-word score results and the measured STI for the 42 scenarios investigated. Each data point represents the average obtained for each STI scenario. The graph also includes the curve derived from the relationship between the STI and the standard PB word from Anderson and Kalb (1987) referred to in IEC60268–16:2011 (IEC, 2011).

The validity of results presented in this section is limited to the STI values investigated (0.20–1.00). As shown in Fig. 3, the

Discussion

The results of this study illustrated in Fig. 3 show lower averaged PB scores than those indicated by the standard PB curve (AK87 and IEC60268–16) for values of STI below 0.80. For values of STI below 0.80, the difference between the scores increases as the STI decreases. Above an STI of approximately 0.80, the relationships between the averaged PB scores and the STI are similar for both studies. This pattern indicates that for a given STI value, a lower level of speech intelligibility than

Background

Recent research (Morales et al., 2018), proposed a new male spectrum for use with STI calculations, which differed substantially from that proposed by the current STI method (IEC, 2011). The long-term average speech spectrum (LTASS) for the British English language was obtained for 40 male talkers and proposed for STI calculations. Three Harvard phonetically-balanced sentence lists were used to obtain the speech spectra and the LTASS was obtained in 1/3rd octaves for frequencies between 100 Hz

Conclusions

A verification of the revised STI for a male spectrum was undertaken for the English language, using phonetically balanced word lists contaminated with natural reverberation, noise, band pass limiting and echoes. The listeners were not familiarized with the words and they listened to the speech material in a real room with a real sound source (binaural listening). For most of the studied scenarios, the results showed a lower averaged PB-word score than indicated by the Standard PB curve.

Acknowledgments

The current study was mostly carried out at London South Bank University as partial works needed to fulfill the requirements of a doctorate programme. The authors would like to thank all the listeners and all the subjects who volunteered for the speech recordings and also to Stephen Dance and Bridget Shield for their continuous support.

References (33)

  • J.P. Egan

    Articulation testing methods

    Laryngoscope

    (1948)
  • H.A. Gustafsson et al.

    Masking of speech by amplitude-modulated noise

    J. Acoust. Soc. Am

    (1994)
  • K.S. Helfer

    Binaural cues and consonant perception in reverberation and noise

    J. Speech Hear. Res.

    (1994)
  • T. Houtgast et al.

    A multi-language evaluation of the RASTI-method for estimating speech intelligibility in auditoria

    Acustica

    (1984)
  • Sound System equipment. Part 16: Objective rating of Speech Intelligibility by Speech Transmission Index

    (2011)
  • Measurement of Reverberation Time with Reference to Other Room Acoustical Parameters

    (2000)
  • View full text