Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition

Wu, Chung-Hsien; Yan, Gwo-Lang

doi:10.1023/B:VLSI.0000015089.17975.f4

Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition

Published: 01 February 2004

Volume 36, pages 91–104, (2004)
Cite this article

Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Chung-Hsien Wu¹ &
Gwo-Lang Yan¹

107 Accesses
7 Citations
Explore all metrics

Abstract

Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses “ah,” “ung,” “um,” “em,” and “hem” in spontaneous speech. Karhunen-Loéve transform (KLT) and linear discriminant analysis (LDA) were adopted to select discriminant features for filled pause detection. In order to suitably determine the number of discriminant features, Bartlett hypothesis testing was adopted. Twenty-six features were selected using Bartlett hypothesis testing. Gaussian mixture models (GMMs), trained with a gradient decent algorithm, were used to improve the filled pause detection performance. The experimental results show that the filled pause detection rates using KLT and LDA were 84.4% and 86.8%, respectively. A significant improvement was obtained in the filled pause detection rate using the discriminative GMM with KLT and LDA. In addition, the LDA features outperformed the KLT features in the detection of filled pauses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Detection and Classification Methods for Animal Sounds

References

W. Ward, ‘Understanding Spontaneous Speech: The Phoenix System,’ Proc. of ICASSP-91, 1991, pp. 365-367.
A. Kai and S. Nakagawa, ‘Investigation on Unknown Word Processing and Strategies for Spontaneous Speech Understanding,’ Proc. of Eurospeech'95, 1995, pp. 2095-2098.
A. Stolcke and E. Shriberg, ‘Statistical Language Model for Speech Disfluencies,’ Proc. of ICASSP-96, vol. 1, 1996, pp. 405-408.
Google Scholar
M. Siu and M. Ostendorf, ‘Modeling Disfluencies in Conversation Speech,’ Proc. of ICSLP-96, vol. 1, 1996, pp. 386-389.
Google Scholar
M. Siu and M. Ostendorf, ‘Variable N-Grams and Extensions for Conversational Speech Language Modeling,’ IEEE Trans. Speech and Audio Processing, vol. 8, no. 1, 2000, pp. 63-75.
Article Google Scholar
L.M. Tomokiyo, ‘Linguistic Properties of Non-Native Speech,’ Proc. of ICASSP-2000, vol. 3, 2000, pp. 1335-1338.
Google Scholar
M. Swerts, A. Wichmann, and R.J. Beun, ‘Filled Pauses as Markers of Discourse Structure,’ Proc. ICSLP-96, vol. 2, 1996, pp. 1033-1036.
Google Scholar
D. O'Shaughnessy, ‘Recognition of Hesitations in Spontaneous Speech,’ Proc. of ICASSP-92, vol. 1, 1992, pp. 521-524.
Google Scholar
M. Gabrea and D. O'Shaughnessy, ‘Detection of Filled Pauses in Spontaneous Conversation Speech,’ Proc. of ICSLP-2000, 2000.
G. Feng and E. Castelli, ‘Some Acoustic Feature of Nasal and Nasalized Vowels: A Target for Vowel Nasalization,’ J. Acoust. Soc. Am., vol. 99, no. 6, 1996, pp. 3694-3706.
Article Google Scholar
M.Y. Chen, ‘Acoustic Correlates of English and French Nasalized Vowels,’ J. Acoust. Soc. Am., vol. 102, no. 4, 1997, pp. 2360-2370.
Article Google Scholar
O. Fujimura, ‘Analysis of Nasal Consonants,’ J. Acoust. Soc. Am., vol. 34, 1962, pp. 1865-1875.
Article Google Scholar
D. Recasens, ‘Place Cues for Nasal Consonants with Special Reference to Catalan,’ J. Acoust. Soc. Am., vol. 73, no. 4, 1983, pp. 1346-1353.
Article MATH Google Scholar
C.-H. Wu and G.-L. Yan, ‘Discriminative Disfluency Modeling for Spontaneous Speech Recognition,’ EuroSpeech, vol. 3, 2001, pp. 1955-1958.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, Republic of China
Chung-Hsien Wu & Gwo-Lang Yan

Authors

Chung-Hsien Wu
View author publications
You can also search for this author in PubMed Google Scholar
Gwo-Lang Yan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CH., Yan, GL. Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 36, 91–104 (2004). https://doi.org/10.1023/B:VLSI.0000015089.17975.f4

Download citation

Published: 01 February 2004
Issue Date: February 2004
DOI: https://doi.org/10.1023/B:VLSI.0000015089.17975.f4

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Detection and Classification Methods for Animal Sounds

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Detection and Classification Methods for Animal Sounds

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation