Skip to main content
Log in

Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we propose a set of automatic stress exaggeration methods that can enlarge the differences between stressed and unstressed syllables. Our stress exaggeration methods can be used in computer-aided language learning systems to assist second language learners perceive stress patterns. The intention of our automatic stress exaggeration methods is to support hyper-pronunciation training which is commonly used in classrooms by teachers. In hyper-pronunciation training, exaggeration is used to help learners increase their awareness of acoustic features and effectively apply these features into their pronunciation. Duration, pitch and intensity have been claimed to be the main acoustic features that are closely related to stress in English language. Thus, four stress exaggeration methods are proposed in this paper: (i) duration-based stress exaggeration, (ii) pitch-based stress exaggeration, (iii) intensity-based stress exaggeration, and (iv) a combined stress exaggeration method that integrates the duration-based, pitch-based and intensity-based exaggeration methods. Our perceptual experimental results show that resynthesised stimuli by our proposed stress exaggerated methods can help learners of English as a Second Language (ESL) better perceive English stress patterns significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akahane-Yamada, R., Tohkura, Y., Bradlow, A. R., & Pisoni, D. B. (1996). Does training in speech perception modify speech production. In Proceedings of international conference on spoken language processing (Vols. 1–4, pp. 606–609).

    Chapter  Google Scholar 

  • Ananthakrishnan, S., & Narayanan, S. (2008). Automatic prosodic event detection using acoustic, lexical, and syntactic evidence. IEEE Trans. Audio, Speech and Language Processing, 16(1), 216–228.

    Article  Google Scholar 

  • Beskow, J., & Sjölander, K. (2000). WaveSurfer—a public domain speech tool. In Proceedings of international conference on spoken language processing, China, Beijing (Vol. 4, pp. 464–467).

    Google Scholar 

  • Bissiri, M. P., & Pfitzinger, H. R. (2009). Italian speakers learn lexical stress of German morphologically complex words. Speech Communication, 51(10), 933–947.

    Article  Google Scholar 

  • Black, A. (2007). Speech synthesis for educational technology. In Proceedings of workshop on speech and language technology in education (pp. 104–107).

    Google Scholar 

  • Bond, Z. (1999). Slips of the ear: errors in the perception of casual conversation. San Diego: Academic Press.

    Google Scholar 

  • Bond, Z., & Small, L. H. (1983). Voicing, vowel and stress mispronunciations in continuous speech. Perception and Psychophysics, 34, 470–474.

    Article  Google Scholar 

  • Bradlow, A., Pisoni, D., Akahana-Yamada, R., & Tohkura, Y. (1997). Training Japanese listeners to identify English /r/ and /l/: Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America, 101(4), 2299–2310.

    Article  Google Scholar 

  • Dalton, C., & Seidlhofer, B. (1994). Pronunciation. Oxford: Oxford University Press.

    Google Scholar 

  • Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51–62.

    Google Scholar 

  • Delmonte, R. (2000). SLIM prosodic automatic tools for self-learning instruction. Speech Communication, 30, 145–166.

    Article  Google Scholar 

  • Delmonte, R. (2009). Prosodic tools for language learning. International Journal of Speech Technology, 12(4), 161–184.

    Article  Google Scholar 

  • Dupoux, E., Pallier, C., Sebastián-Gallés, N., & Mehler, J. (1997). A destressing ‘deafness’ in French?. Journal of Memory and Language, 36, 406–421.

    Article  Google Scholar 

  • Engelbrecht, K. P., Quade, M., & Möller, S. (2009). Analysis of a new simulation approach to dialog system evaluation. Speech Communication, 51, 1234–1252.

    Article  Google Scholar 

  • Eskenazi, M. (2009). An overview of spoken language technology for education. Speech Communication, 51(10), 832–844.

    Article  Google Scholar 

  • Eskenazi, M., & Hansma, S. (1998). The Fluency pronunciation trainer. In Proceedings of speech technology in language learning (pp. 77–80).

    Google Scholar 

  • Fant, G. (1960). Acoustic theory of speech production. Moutons’Gravenhage.

    Google Scholar 

  • Felps, D., Bortfeld, H., & Gutierrez-Osuna, R. (2009). Foreign accent conversion in computer assisted pronunciation training. Speech Communication, 51(10), 920–932.

    Article  Google Scholar 

  • Field, J. (2005). Intelligibility and the listener: the role of lexical stress. TESOL. Quarterly, 39, 399–423.

    Article  Google Scholar 

  • Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustical Society of America, 27, 765–768.

    Article  Google Scholar 

  • Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38(2), 201–223.

    Article  Google Scholar 

  • Hincks, R. (2002). Speech synthesis for teaching lexical stress. TMH-QPSR, 44, 153–156.

    Google Scholar 

  • Hirose, K. (2004). Accent type recognition of Japanese using perceived mora pitch values and its use for pronunciation training system. In Proceedings of international symposium on tonal aspects of languages, Beijing (pp. 77–80).

    Google Scholar 

  • Kang, M., Kashiwagi, H., Treviranus, J., & Kaburagi, M. (2008). Synthetic speech in foreign language learning: an evaluation by learners. International Journal of Speech Technology, 11(2), 97–106.

    Article  Google Scholar 

  • Lu, J., Wang, R., De Silva, L. C., Gao, Y., & Liu, J. (2010). CASTLE: a Computer-assisted stress teaching and learning environment for learners of English as a second language. In InterSpeech, Makuhari, Japan (pp. 606–609).

    Google Scholar 

  • MIT courseware (2006). Transcribing Prosodic Structure of Spoken Utterances with ToBI http://ocw.mit.edu/OcwWeb. Accessed on 15/08/2009.

  • Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9(5–6), 453–467.

    Article  Google Scholar 

  • Nagamine, T. (2002). An experimental study on the teachability and learnability of English intonational aspect: Acoustic analysis on F0 and native-speaker judgment task. Journal of Language and Linguistics, 1(4), 362–399.

    Google Scholar 

  • Nolan, F. (2003). Intonational equivalence: an experimental evaluation of pitch scales. In Proceedings of the 15th international congress of phonetic sciences, Barcelona (pp. 771–774).

    Google Scholar 

  • Ostendorf, M., Price, P. J., & Shattuck-Hufnagel, S. (1995). The Boston University radio news corpus. Boston Univ., Boston, MA, Tech. Rep. ECS-95-001, Mar.

  • Peperkamp, S., & Dupoux, E. (2002). A typological study of stress ‘deafness’. In C. Gussenhoven & N. Warner (Eds.), Laboratory Phonology 7 (pp. 203–240). Berlin: Mouton de Gruyter.

    Google Scholar 

  • Raux, A., & Black, A. W. (2003). A unit selection approach to F0 modeling and its application to emphasis. In Proceedings of IEEE workshop on automatic speech recognition and understanding (pp. 700–705).

    Chapter  Google Scholar 

  • Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., & Hirschberg, J. (1992). Tobi: a standard for labeling english prosody. In Proceedings of international conference on spoken language processing (pp. 867–870).

    Google Scholar 

  • Sluijter, A. M. C., van Heuven, V. J., & Pacilly, J. J. A. (1997). Spectral Balance as a cue in the perception of linguistic stress. J. Acoust. Soc. Amer., 101, 503–513.

    Article  Google Scholar 

  • Solé Sabater, M. J. (1991). Stress and Rhythm in English. Revista Alicantina de Estudios Ingleses, 4, 145–162.

    Google Scholar 

  • Sundström, A. (1998). Automatic prosody modification as a means for foreign language pronunciation training. In Proceedings of ISCA workshop on speech technology in language learning (STILL 98), Marholmen, Sweden (pp. 49–52).

    Google Scholar 

  • Tamburini, F., & Caini, C. (2005). An automatic system for detecting prosodic prominence in American English continuous speech. International Journal of speech technology, 8(1), 33–44.

    Article  Google Scholar 

  • Toda, T., Black, A. W., & Tokuda, K. (2007). Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech and Language Processing, 15(8), 2222–2235.

    Article  Google Scholar 

  • Todaka, Y. (1995). A preliminary study of voice quality differences between Japanese and American English: Some pedagogical suggestions. JALT Journal, 17(2), 261–268.

    Google Scholar 

  • Wang, C., & Seneff, S. (2006). High-quality speech-to-speech translation for computer-aided language learning. ACM Transactions on Speech and Language Processing, 3(2), 1–21.

    Article  Google Scholar 

  • Wang, R., & Lu, J. (2011). Investigation of the golden speaker for a language learner from the imitation preference perspective by voice modification. Speech Communication, 53, 175–184.

    Article  Google Scholar 

  • WWW (2011). Voices materials on http://www.box.net/shared/srmu4tjj9f.

  • Xie, H., Andreae, P., Zhang, M., & Warren, P. (2004). Detecting stress in spoken English using decision trees and support vector machines. Australian Computer Science Communications (Data Mining, CRPIT 32), 26(7), 145–150.

    Google Scholar 

  • Yoon, K. (2008). Synthesis and evaluation of prosodically exaggerated utterances: a preliminary study. In Proceedings of conference of the association of modern British & American language & literature.

    Google Scholar 

  • Zielinski, B. W. (2008). The listener: No longer the silent partner in reduced intelligibility. System, 36, 69–84.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruili Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, J., Wang, R. & De Silva, L.C. Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress. Int J Speech Technol 15, 87–98 (2012). https://doi.org/10.1007/s10772-011-9124-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-011-9124-2

Keywords

Navigation