Skip to main content

Advertisement

Log in

Understanding Lombard speech: a review of compensation techniques towards improving speech based recognition systems

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Building voice-based Artificial Intelligence (AI) systems that can efficiently interact with humans through speech has become plausible today due to rapid strides in efficient data-driven AI techniques. Such a human–machine voice interaction in real world would often involve a noisy ambience, where humans tend to speak with additional vocal effort than in a quiet ambience, to mitigate the noise-induced suppression of vocal self-feedback. This noise induced change in the vocal effort is called Lombard speech. In order to build intelligent conversational devices that can operate in a noisy ambience, it is imperative to study the characteristics and processing of Lombard speech. Though the progress of research on Lombard speech started several decades ago, it needs to be explored further in the current scenario which is seeing an explosion of voice-driven applications. The system designed to work with normal speech spoken in a quiet ambience fails to provide the same performance in changing environmental contexts. Different contexts lead to different styles of Lombard speech and hence there arises a need for efficient ways of handling variations in speaking styles in noise. The Lombard speech is also more intelligible than normal speech of a speaker. Applications like public announcement systems with speech output interface should talk with varying degrees of vocal effort to enhance naturalness in a way that humans adapt to speak in noise, in real time. This review article is an attempt to summarize the progress of work on the possible ways of processing Lombard speech to build smart and robust human–machine interactive systems with speech input–output interface, irrespective of operating environmental contexts, for different application needs. This article is a comprehensive review of the studies on Lombard speech, highlighting the key differences observed in acoustic and perceptual analysis of Lombard speech and detailing the Lombard effect compensation methods towards improving the robustness of speech based recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Afify M, Gong Y, Haton JP (1998) A general joint additive and convolutive bias compensation approach applied to noisy Lombard speech recognition. IEEE Trans Speech Audio Process 6(6):524–538

    Google Scholar 

  • Alghamdi N, Maddock S, Marxer R, Barker J, Brown GJ (2018) A corpus of audio-visual Lombard speech with frontal and profile views. J Acoust Soc Am 143(6):523–529

    Google Scholar 

  • Anglade Y, Fohr D, Junqua JC (1993) Speech discrimination in adverse conditions using acoustic knowledge and selectively trained neural networks. In: 1993 IEEE international conference on acoustics, speech, and signal processing, 1993. ICASSP-93, vol 2. IEEE, pp 279–282

  • Applebaum TH, Hanson BA (1991) Regression features for recognition of speech in quiet and in noise. In: 1991 international conference on acoustics, speech, and signal processing, 1991. ICASSP-91. IEEE, pp 985–988

  • Bapineedu G (2010) Analysis of Lombard effect speech and its application in speaker verification for imposter detection. Ph.D. thesis, International Institute of Information Technology Hyderabad, India

  • Bapineedu G, Avinash B, Gangashetty SV, Yegnanarayana B (2009) Analysis of Lombard speech using excitation source information. In: INTERSPEECH, Citeseer, pp 1091–1094

  • Bond ZS, Moore TJ (1990) A note on loud and Lombard speech. Technical report, DTIC document

  • Boril H (2008) Robust speech recognition: analysis and equalization of Lombard effect in Czech Corpora. Ph.D. thesis, CTU in Prague, Czech Republic. http://www.utdallas.edu/~ hynek

  • Boril H, Hansen JH (2010) Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments. IEEE Trans Audio Speech Lang Process 18(6):1379–1393

    Google Scholar 

  • Bořil H, Hansen JH (2011) Ut-scope: towards LVCSR under Lombard effect induced by varying types and levels of noisy background. In: 2011 IEEE international conference on acoustics. Speech and signal processing (ICASSP), IEEE, pp 4472–4475

  • Bořil H, Pollák P (2005) Design and collection of Czech Lombard speech database. In: Proceeding of Interspeech, Citeseer, vol 5, pp 1577–1580

  • Bou-Ghazale SE, Hansen JH (1994) Duration and spectral based stress token generation for hmm speech recognition under stress. In: 1994 IEEE international conference on acoustics, speech, and signal processing, 1994. ICASSP-94. IEEE, vol 1, pp I–413

  • Bou-Ghazale SE, Hansen JH (2000) A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans Speech Audio Process 8(4):429–442

    Google Scholar 

  • Castellanos A, Benedí JM, Casacuberta F (1996) An analysis of general acoustic-phonetic features for Spanish speech produced with the Lombard effect. Speech Commun 20(1):23–35

    Google Scholar 

  • Chen Y (1988) Cepstral domain talker stress compensation for robust speech recognition. IEEE Trans Acoust Speech Signal Process 36(4):433–439

    MATH  Google Scholar 

  • Chi SM, Oh YH (1996a) Lombard effect compensation and noise suppression for noisy Lombard speech recognition. In: Fourth international conference on spoken language, 1996. ICSLP 96. Proceedings, vol 4. IEEE, pp 2013–2016

  • Chi SM, Oh YH (1996b) Spectral magnitude normalisation and Cepstral coefficient transform for noisy-Lombard speech recognition. Electron Lett 32(19):1761–1763

    Google Scholar 

  • Chung V, Mirante N, Otten J, Vatikiotis-Bateson E (2005) Audiovisual processing of Lombard speech. In: AVSP, pp 55–56

  • Cooke M, Lecumberri MLG (2012) The intelligibility of Lombard speech for non-native listeners. J Acoust Soc Am 132(2):1120–1129

    Google Scholar 

  • Cooke M, Lu Y (2010) Spectral and temporal changes to speech produced in the presence of energetic and informational maskers. J Acoust Soc Am 128(4):2059–2069

    Google Scholar 

  • Cooke M, Mayo C, Villegas J (2014a) The contribution of durational and spectral changes to the Lombard speech intelligibility benefit. J Acoust Soc Am 135(2):874–883

    Google Scholar 

  • Cooke N, Shen A, Russell M (2014b) Exploiting a ‘gaze-lombard effect’ to improve asr performance in acoustically noisy settings. In: 2014 IEEE international conference on acoustics. speech and signal processing (ICASSP). IEEE, pp 1754–1758

  • Davis C, Kim J, Grauwinkel K, Mixdorff H (2006) Lombard speech: Auditory (a), visual (v) and av effects. In: Proceedings of the third international conference on speech prosody, Citeseer, pp 248–252

  • Dreher JJ, O’Neill J (1957) Effects of ambient noise on speaker intelligibility for words and phrases. J Acoust Soc Am 29(12):1320–1323

    Google Scholar 

  • Drugman T (2011) Advances in glottal analysis and its applications. University of Mons, Belgium

    Google Scholar 

  • Drugman T, Dutoit T (2010) Glottal-based analysis of the Lombard effect. In: Interspeech, pp 2610–2613

  • Folk L, Schiel F (2011) The Lombard effect in spontaneous dialog speech. In: Proceedings of the interspeech, pp 2701–2704

  • Fricke J (1970) Syllabic duration and the Lombard effect. Int Audiol 9(1):53–57

    MathSciNet  Google Scholar 

  • Garnier M (2008) May speech modifications in noise contribute to enhance audio-visible cues to segment perception? In: AVSP, pp 95–100

  • Garnier M, Henrich N (2014) Speaking in noise: how does the Lombard effect improve acoustic contrasts between speech and ambient noise? Comput Speech Lang 28(2):580–597

    Google Scholar 

  • Garnier M, Bailly L, Dohen M, Welby P, Lœvenbruck H (2006a) An acoustic and articulatory study of Lombard speech: global effects on the utterance. energy 3500 (5500 Hz):5500 Hz

  • Garnier M, Dohen M, Loevenbruck H, Welby P, Bailly L (2006b) The Lombard effect: a physiological reflex or a controlled intelligibility enhancement? In: 7th international seminar on speech production, pp 255–262

  • Garnier M, Henrich N, Dubois D (2010) Influence of sound immersion and communicative interaction on the Lombard effect. J Speech Lang Hear Res 53(3):588–608

    Google Scholar 

  • Garnier M, Ménard L, Richard G (2012) Effect of being seen on the production of visible speech cues. A pilot study on Lombard speech. In: 13th annual conference of the international speech communication association (InterSpeech 2012), pp 611–614

  • Godoy E, Koutsogiannaki M, Stylianou Y (2014) Approaching speech intelligibility enhancement with inspiration from Lombard and clear speaking styles. Comput Speech Lang 28(2):629–647

    Google Scholar 

  • Goldenberg R, Cohen A, Shallom I (2006) The lombard effect’s influence on automatic speaker verification systems and methods for its compensation. In: International conference on information technology: research and education, 2006. ITRE’06. IEEE, pp 233–237

  • Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, et al (2014) Deep speech: scaling up end-to-end speech recognition. arXiv:14125567

  • Hansen JHL (1988) Analysis and compensation of stressed and noisy speech with application to robust automatic recognition. Ph.D. thesis, Atlanta, GA, USA, aAI8904810

  • Hansen JH (1989a) Analysis and compensation of stressed and noisy speech with application to robust automatic recognition. Sig Process 17(3):282

    Google Scholar 

  • Hansen JH (1989b) Evaluation of acoustic correlates of speech under stress for robust speech recognition. In: Bioengineering conference, 1989, proceedings of the 1989 fifteenth annual northeast, IEEE, pp 31–32

  • Hansen JH (1994) Morphological constrained feature enhancement with adaptive Cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect. IEEE Trans Speech Audio Process 2(4):598–614

    Google Scholar 

  • Hansen JHL, Bria ON (1990) Lombard effect compensation for robust automatic speech recognition in noise. In: The first international conference on spoken language processing, ICSLP 1990, Kobe, Japan, November 18–22, 1990. http://www.isca-speech.org/archive/icslp_1990/i90_1125.html

  • Hansen JH, Varadarajan V (2009) Analysis and compensation of Lombard speech across noise type and levels with application to in-set/out-of-set speaker recognition. IEEE Trans Audio Speech Lang Process 17(2):366–378

    Google Scholar 

  • Hansen JH, Womack BD (1996) Feature analysis and neural network-based classification of speech under stress. IEEE Trans Speech Audio Process 4(4):307–313

    Google Scholar 

  • Hansen JH, Bou-Ghazale SE, Sarikaya R, Pellom B (1997) Getting started with SUSAS: a speech under simulated and actual stress database. Eurospeech 97:1743–46

    Google Scholar 

  • Hanson BA, Applebaum TH (1990) Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech. In: 1990 international conference on acoustics, speech, and signal processing, 1990. ICASSP-90. IEEE, pp 857–860

  • Hanson BA, Applebaum TH (1993) Subband or cepstral domain filtering for recognition of Lombard and channel-distorted speech. In: 1993 IEEE international conference on acoustics, speech, and signal processing, 1993. ICASSP-93, vol 2. IEEE, pp 79–82

  • Hazan V, Baker R (2011) Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions. J Acoust Soc Am 130(4):2139–2152

    Google Scholar 

  • Heracleous P, Ishi CT, Sato M, Ishiguro H, Hagita N (2013) Analysis of the visual Lombard effect and automatic recognition experiments. Comput Speech Lang 27(1):288–300

    Google Scholar 

  • Huang FJ, Chen T (2001) Consideration of Lombard effect for speechreading. In: IEEE fourth workshop on multimedia signal processing. IEEE, pp 613–618

  • Huber JE, Chandrasekaran B (2006) Effects of increasing sound pressure level on lip and jaw movement parameters and consistency in young adults. J Speech Lang Hear Res 49(6):1368–1379

    Google Scholar 

  • Jagtap M, Rao P (2015) Enhancing speech intelligibility based on noise characteristics. In: 2015 twenty first national conference on communications (NCC). IEEE, pp 1–6

  • Junqua JC (1991) The influence of psychoacoustic and psycholinguistic factors on listener judgments of intelligibility of normal and lombard speech. In: 1991 international conference on acoustics, speech, and signal processing, 1991. ICASSP-91. IEEE, pp 361–364

  • Junqua JC (1996) The influence of acoustics on speech production: a noise-induced stress phenomenon known as the Lombard reflex. Speech Commun 20(1):13–22

    Google Scholar 

  • Junqua JC, Anglade Y (1990) Acoustic and perceptual studies of Lombard speech: application to isolated-words automatic speech recognition. In: 1990 international conference on acoustics, speech, and signal processing, 1990. ICASSP-90. IEEE, pp 841–844

  • Junqua JC, Mak B, Reaves B (1994) A robust algorithm for word boundary detection in the presence of noise. IEEE Trans Speech Audio Process 2(3):406–412

    Google Scholar 

  • Junqua JC, Fincke S, Field KL (1998) Influence of the speaking style and the noise spectral tilt on the Lombard reflex and automatic speech recognition. In: ICSLP, pp 467–470

  • Junqua JC, Fincke S, Field K (1999) The Lombard effect: a reflex to better communicate with others in noise. In: 1999 IEEE international conference on acoustics, speech, and signal processing, 1999. Proceedings, vol 4. IEEE, pp 2083–2086

  • Kim S (2005) Durational characteristics of Korean Lombard speech. In: Ninth European Conference on Speech Communication and Technology

  • Kleczkowski P, Zak A, Król-Nowak A (2017) Lombard effect in polish speech and its comparison in english speech. Arch Acoust 42(4):561–569

    Google Scholar 

  • Kwak I, Kang HG (2015) Robust formant features for speaker verification in the lombard effect. In: 2015 asia-pacific signal and information processing association annual summit and conference (APSIPA). IEEE, pp 114–118

  • Lane H, Tranel B (1971) The Lombard sign and the role of hearing in speech. J Speech Lang Hear Res 14(4):677–709

    Google Scholar 

  • Lane H, Tranel B, Sisson C (1970) Regulation of voice communication by sensory dynamics. J Acoust Soc Am 47(2B):618–624

    Google Scholar 

  • Lee J, Ali H, Ziaei A, Hansen JH (2015) Analysis of speech and language communication for cochlear implant users in noisy Lombard conditions. In: 2015 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 5132–5136

  • Letowski T, Frank T, Caravella J (1993) Acoustical properties of speech produced in noise presented through supra-aural earphones. Ear Hear 14(5):332–338

    Google Scholar 

  • Lombard E (1911) Le signe de l’elevation de la voix. Ann Maladies Oreille, Larynx, Nez, Pharynx 37(101–119):25

    Google Scholar 

  • Lu Y, Cooke M (2008) Speech production modifications produced by competing talkers, babble, and stationary noise. J Acoust Soc Am 124(5):3261–3275

    Google Scholar 

  • Lu Y, Cooke M (2009) The contribution of changes in f0 and spectral tilt to increased intelligibility of speech produced in noise. Speech Commun 51(12):1253–1262

    Google Scholar 

  • Marxer R, Barker J, Alghamdi N, Maddock S (2018) The impact of the Lombard effect on audio and visual speech recognition systems. Speech Commun 100:58–68

    Google Scholar 

  • Meekings S, Evans S, Lavan N, Boebinger D, Krieger-Redwood K, Cooke M, Scott SK (2016) Distinct neural systems recruited when speech production is modulated by different masking sounds. J Acoust Soc Am 140(1):8–19

    Google Scholar 

  • Mittal S, Vyas S, Prasanna S (2013) Analysis of Lombard and angry speech using Gaussian mixture models and kl divergence. In: 2013 national conference on communications (NCC), IEEE, pp 1–5

  • Mokbel CE, Chollet GF (1995) Automatic word recognition in cars. IEEE Trans Speech Audio Process 3(5):346–356

    Google Scholar 

  • Nicolaidis K (2012) Consonant production in greek Lombard speech: an electropalatographic study. Ital J Linguist 24(1):65–101

    Google Scholar 

  • Patel R, Schell KW (2008) The influence of linguistic content on the Lombard effect. J Speech Lang Hear Res 51(1):209–220

    Google Scholar 

  • Pickett JM (1956) Effects of vocal force on the intelligibility of speech sounds. J Acoust Soc Am 28(5):902–905

    Google Scholar 

  • Pisoni D, Bernacki R, Nusbaum H, Yuchtman M (1985) Some acoustic-phonetic correlates of speech produced in noise. In: IEEE international conference on ICASSP’85 acoustics, speech, and signal processing. IEEE, vol 10, pp 1581–1584

  • Pittman AL, Wiley TL (2001) Recognition of speech produced in noise. J Speech Lang Hear Res 44(3):487–496. https://doi.org/10.1044/1092-4388(2001/038)

    Article  Google Scholar 

  • Raitio T, Suni A, Vainio M, Alku P (2011) Analysis of hmm-based Lombard speech synthesis. In: Interspeech, pp 2781–2784

  • Raitio T, Lu H, Kane J, Suni A, Vainio M, King S, Alku P (2014) Voice source modelling using deep neural networks for statistical parametric speech synthesis. In: 2014 22nd European signal processing conference (EUSIPCO). IEEE, pp 2290–2294

  • Raitio T, Juvela L, Suni A, Vainio M, Alku P (2016) Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis. Speech Commun 81:104–119

    Google Scholar 

  • Rajasekaran P, Doddington G (1985) Speech recognition in the f-16 cockpit using principal spectral components. In: IEEE international conference on ICASSP’85 acoustics, speech, and signal processing, vol 10. IEEE, pp 882–885

  • Rajasekaran P, Doddington G, Picone J (1986) Recognition of speech under stress and in noise. In: IEEE international conference on ICASSP’86 acoustics, speech, and signal processing, vol 11. IEEE, pp 733–736

  • Saleem MM, Liu G, Hansen JH (2015) Weighted training for speech under Lombard effect for speaker recognition. In: 2015 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 4350–4354

  • Schulman R (1989) Articulatory dynamics of loud and normal speech. J Acoust Soc Am 85(1):295–312

    Google Scholar 

  • Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S (2018) X-vectors: robust dnn embeddings for speaker recognition. In: 2018 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 5329–5333

  • Stanton BJ, Jamieson L, Allen G (1988) Acoustic-phonetic analysis of loud and Lombard speech in simulated cockpit conditions. In: 1988 international conference on acoustics, speech, and signal processing, 1988. ICASSP-88. IEEE, pp 331–334

  • Stanton BJ, Jamieson L, Allen GD (1989) Robust recognition of loud and Lombard speech in the fighter cockpit environment. In: 1989 international conference on acoustics, speech, and signal processing, 1989. ICASSP-89. IEEE, pp 675–678

  • Steeneken HJ, Hansen JH (1999) Speech under stress conditions: overview of the effect on speech production and on system performance. In: ICASSP, vol 99, pp 2079–2082

  • Suni A, Karhila R, Raitio T, Kurimo M, Vainio M, Alku P (2013) Lombard modified text-to-speech synthesis for improved intelligibility: submission for the hurricane challenge 2013. In: INTERSPEECH, pp 3562–3566

  • Suzuki T, Nakajima K, Abe Y (1994) Isolated word recognition using models for acoustic phonetic variability by Lombard effect. In: INTERSPEECH

  • Takizawa Y, Hamada M (1990) Lombard speech recognition by formant-frequency-shifted LPC cepstrum. In: INTERSPEECH

  • Tartter VC, Gomes H, Litwin E (1993) Some acoustic effects of listening to noise on speech production. J Acoust Soc Am 94(4):2437–2440

    Google Scholar 

  • Tian B, Sun M, Sclabassi RJ, Yi K, (2003) A unified compensation approach for speech recognition in severely adverse environment. In: Fourth international symposium on uncertainty modeling and analysis, (2003) ISUMA 2003. IEEE, pp 256–261

  • Uma Maheswari S, Shahina A, NayeemullaKhan A, Divya J (2015) Spectral transformation of Lombard speech to normal speech for speaker recognition systems. In: International conference of soft computing systems

  • Uma Maheswari S, Shahina A, Rishikesh R, Nayeemullah Khan A (2020) A study on the impact of Lombard effect on recognition of hindi syllabic units using CNN based multimodal ASR systems. Arch Acoust 45(3):419–431

    Google Scholar 

  • Vainio M, Aalto D, Suni A, Arnhold A, Raitio T, Seijo H, Järvikivi J, Alku P (2012) Effect of noise type and level on focus related fundamental frequency changes. In: Interspeech, pp 671–674

  • Van Summers W, Pisoni DB, Bernacki RH, Pedlow RI, Stokes MA (1988) Effects of noise on speech production: acoustic and perceptual analyses. J Acoust Soc Am 84(3):917–928

    Google Scholar 

  • Vlaj D, Kacic Z (2011) The influence of Lombard effect on speech recognition. INTECH Open Access Publisher

  • Wakao A, Takeda K, Itakura F (1996) Variability of Lombard effects under different noise conditions. In: Fourth international conference on spoken language, 1996. ICSLP 96. Proceedings, vol 4. IEEE, pp 2009–2012

  • Welby P (2006) Intonational differences in Lombard speech: Looking beyond f0 range. In: Proceedings of the third international conference on speech prosody, pp 763–766

  • Zhao Y, Jurafsky D (2009) The effect of lexical frequency and Lombard reflex on tone hyperarticulation. J Phon 37(2):231–247

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Uma Maheswari.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Uma Maheswari, S., Shahina, A. & Nayeemulla Khan, A. Understanding Lombard speech: a review of compensation techniques towards improving speech based recognition systems. Artif Intell Rev 54, 2495–2523 (2021). https://doi.org/10.1007/s10462-020-09907-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09907-5

Keywords

Navigation