Elsevier

Speech Communication

Volume 47, Issue 4, December 2005, Pages 411-423
Speech Communication

New Turkish intelligibility test for assessing speech communication systems

https://doi.org/10.1016/j.specom.2005.04.005Get rights and content

Abstract

This article describes a Turkish Intelligibility Test (TIT) in order to evaluate Quality of Service (QoS) of some speech communication systems (SCSs) based on Turkish phonetic properties. Since widely used speech communication systems are generally developed considering one language, the need of broadening linguistic coverage in designing next generation SCSs requires assessment methods in many broadly used languages including Turkish as well. In this article, selection of TIT material considering Turkish phonetic characteristics is discussed and the conduct of TIT including recording, preparing and presentation of test material is detailed. Experimental results, which present subjective intelligibility assessment outcomes of three well-known speech coders, are given in comparison with the results of the Diagnostic Rhyme Test (DRT) in North American English. Intelligibility assessment via TIT is expected to be a leading and critical concept in improving the intelligibility performance of Turkish processed by SCSs under varying acoustic environments.

Introduction

Speech communication via electronic devices is one of the most important features of the information age; for instance, a typical service user may make and receive calls daily via different local operators, channels, and even different countries. If details of speech transmission processes are considered deeply, dozens of possible degradation sources can be defined, such as the acoustic environments of both peers, sound to electrical signal conversion, and so forth. However, all these deteriorating sources act differently on the main quality of service (QoS) parameters. Intelligibility of speech transmission, which is the focus of this study, is one of those main parameters. Speech intelligibility can be assessed in three folds; prediction, objective methods and subjective tests.

For prediction, possible intelligibility losses are being estimated by taking physical and perceptual properties of a transmission system into account. Pioneering works in this field inspired the Articulation Index (AI) method, in which intelligibility is measured as the information content (in signal to noise ratios-SNR) carried on equally weighted 20 frequency bands defined in speech spectrum (Steeneken, 1992). AI includes the articulation tables of syllables of the consonant–vowel–consonant (CVC) type. Note that the syllable tables do not contain vowels in nonsense words or on unstressed position.

On the other hand, objective methods are not only based on mathematical model oriented calculations, as done in the previously mentioned prediction based approach, but also the physical properties of the speech communication channel as well. Speech Transmission Index (STI), which requires application of a specific test signal to the evaluated voice transmission system, can be given as a good example of the objective methods (Sander, 2002). STI, which is standardized by IEC standard 60268—16 (version 2,1998), accounts correctly for band-pass limiting, noise, reverberation, echoes and nonlinear distortion. Another objective method is called Speech Intelligibility Index (SII), which is obtained by calculation, considering the physical properties of the speech transmission channel. The SII (former AI), accounts for band-pass limiting and noise by ignoring the effect of temporal and nonlinear distortions. SII is standardized by ANSI standard S3.05 (1997). The main difference among the methods mentioned is based on the handling of data. SII and STI employ data from simulated and emulated systems respectively, while AI employs the predicted values obtained by mathematical calculations. Besides, there exists a similarity among these methods. They all measure total information content in frequency bands defined in the spectral range of speech.

Intelligibility results obtained by subjective assessments better represent real life scenarios as compared to the previously explained methods. However, application of subjective tests has many difficulties, such as preparation of language specific test material, selection and training of talker–listener crew, and so on. There exist three different approaches that speech materials are based on; logatoms (nonsense words), regular words and sentences. There are also various techniques for the presentation of the test material to the subjects and their response type. One way of presenting a test word is embedding the word into a carrier phrase. This method has the advantage that the talker can control his/her vocal effort, and in case of temporal distortion a representative condition with respect to continuous speech is obtained. On the other hand the response method might be open or closed. Open response method allows the listener to respond to what he/she thinks was heard while closed response method offers the listener some alternative from which a selection has to be made. An example of such intelligibility tests is called Diagnostic Rhyme Test (DRT), which could be used for measuring speech intelligibility. The DRT method developed by Voiers, 1977, Voiers, 1983 is based on only two alternative words. In DRT, listeners are asked to find the uttered word from the given word pair, from a list composed of such word pairs, differing only in the first phoneme. Another closed response paradigm method is called Modified Rhyme Test (MRT) where the listener has to select an initial consonant or a vowel from a group of more than two alternatives (House et al., 1965). DRT has the advantage that it only requires a simple training session of the listeners while open response paradigm test and MRT require more extensive training.

Considering the evaluation of digital speech communication technologies, which begins with basic waveform coding and extends to next generation wired and wireless packet-switching networks, the Turkish Intelligibility Test (TIT), by assessing the intelligibility of speech in Turkish language, covers a broad gap in QoS evaluation for a great number of speech service users in Turkey and in the Turkic language speaking countries.

Since speech communication and processing technologies have a wide variety of application areas, especially in telecommunications and security, among Turkic language speaking countries with a total population of more than 200 million people, measuring speech intelligibility of the Turkish language, accepted as the language of the silk road, is a promising field in the next generation information systems including Turkish-specific speech agents. Assessing Turkish speech intelligibility also provides a good guidance for the further speech-based security researches.

In this article, a new TIT for assessing speech communication systems (SCSs) including speech coding, text-to-speech, speech communication equipments, etc. is discussed in detail. First, the phonetic properties of the Turkish language and the methodology of selecting TIT material are presented. Then, the TIT procedure is explained by giving the infrastructure of TÜBİTAK-UEKAE’s Acoustics laboratory. Consequently, it is experimentally shown that TIT correlates with the Diagnostic Rhyme Test in North American English (EDRT). In experimental study, three well-known speech coders (2.4 kbps Linear Predictive Coding-LPC10e, 4.8 kbps Codebook-Excited Linear Prediction-CELP and 16 kbps Continuously Variable Slope Delta Modulation-CVSD) are assessed in varied acoustic environments. All the results are discussed in detail.

Section snippets

Turkish intelligibility test

TIT stands as a robust subjective test method which might be used especially to assess the intelligibility of SCSs, not only specific to the Turkish language, but also to other related languages. The increasing demand in speech communication technologies requires a reliable intelligibility test to improve QoS of speech agents in Turkish.

Experimental results

In the experimental study, intelligibility of three well-known speech coders (2.4 kbps LPC10e, 4.8 kbps CELP and 16 kbps CVSD) are assessed by TIT in three different acoustic environments (QUIET, 12 dB SNR, 6 dB SNR). The purpose of the intelligibility assessments in the Turkish language by evaluating these speech coders is twofold: First, intelligibility performances of these coders in different languages and under various conditions have been assessed by human factor laboratories all over the

Conclusion

Since the speech communication technologies are generally developed considering one language, the need of broadening linguistic coverage in designing next generation communication systems requires assessment methods in many broadly used languages, including Turkish. Furthermore, intelligibility assessment in Turkish is expected to be a leading and critical concept for the purpose of improving the intelligibility of Turkish in SCSs under varying acoustic environments. Since subjective assessment

Acknowledgement

We would like to thank to all attendees of the assessments for their patience. We are also grateful to respectful members of Istanbul University, Cerrahpaşa Medical Faculty—Ear Nose Throat and Head & Neck Surgery Department.

References (14)

  • A.A. Afifi et al.

    Statistical Analysis—Computer Oriented Approach

    (1972)
  • ANSI, 1989. S3.2, Method for Measuring the Intelligibility of Speech over Communication...
  • Banguog̃lu, T., 2004. Türkçenin Grameri (Turkish Grammar). Türk Dil Kurumu Yayınları, No. 528, Ankara,...
  • Ergenç, İ., Ölmez, M., 1995. Konuşma Dili ve Türkçenin Söyleyiş Sözlüg̃ü (Language and Pronunciation Dictionary of...
  • Ergin, M., 2002. Türk Dil Bilgisi (Turkish Grammar). Bayrak Basm-Yaym-Tantm, İstanbul,...
  • A.S. House et al.

    Articulation testing methods: consonantal differentiation with a closed response set

    J. Acoust. Soc. Am.

    (1965)
  • Institute of RWTÜV GmbH, 2001. Report ref: RWTÜV 011128G01-IAC, Germany, December...
There are more references available in the full text version of this article.
View full text