Detecting laughter in spontaneous speech by constructing laughter bouts

Li, Yan-Xiong; He, Qian-Hua

doi:10.1007/s10772-011-9097-1

Detecting laughter in spontaneous speech by constructing laughter bouts

Published: 14 July 2011

Volume 14, pages 211–225, (2011)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Yan-Xiong Li¹ &
Qian-Hua He¹

327 Accesses
8 Citations
Explore all metrics

Abstract

Laughter frequently occurs in spontaneous speech (e.g. conversational speech, meeting speech). Detecting laughter is quite important for semantic analysis, highlight extraction, spontaneous speech recognition, etc. In this paper, we first analyze the characteristic differences between speech and laughter, and then propose an approach for detecting laughter in spontaneous speech. In the proposed approach, non-silence signal segments are first extracted from spontaneous speech by using voice activity detection, and then split into syllables. Afterward, the possible laughter bouts are constructed by merging adjacent syllables (using symmetrical Itakura distance measure and duration threshold) instead of using a sliding fixed-length window. Finally, hidden Markov models (HMMs) are used to recognize the possible laughter bouts as laughs, speech sounds or other sounds. Experimental evaluations show that the proposed approach can achieve satisfactory results in detecting two types of audible laughs (audible solo and group laughs). Precision rate, recall rate, and F1-measure (harmonic mean of precision and recall rate) are 83.4%, 86.1%, and 84.7%, respectively. Compared with the sliding-window-based approach, 4.9% absolute improvements in F1-measure are obtained. In addition, the laughter boundary errors obtained by the proposed approach are smaller than that obtained by the sliding-window-based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bachorowski, J. A., Smoski, M. J., & Owren, M. J. (2001). The acoustic features of human laughter. The Journal of the Acoustical Society of America, 110(3), 1581–1597.
Article Google Scholar
Bickley, C., & Hunnicutt, S. (1992). Acoustic analysis of laughter. Paper presented at the Int. Conf. on Spoken Language Processing, Banff, Canada, 12–16 Oct.
Burileanu, D., Pascalin, L., Burileanu, C., & Puchiu, M. (2000). An adaptive and fast speech detection algorithm. Paper presented at the The Third Int. Workshop on Text, Speech and Dialogue, Brno, Czech Republic, 13–16 Sept.
Cai, R., Lie, L., Zhang, H. J., & Cai, L. H. (2003). Highlight sound effects detection in audio stream. Paper presented at the IEEE Int. Conf. on Multimedia and Expo, Baltimore, MD, USA, 6–9 July.
Carter, A. (2000). Automatic acoustic laughter detection. Unpublished Master thesis, Keele University, Staffordshire, UK.
Dubnowski, J., Schafer, R., & Rabiner, L. (1976). Real-time digital hardware pitch detector. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(1), 2–8.
Article Google Scholar
Gray, J. A. H., & Markel, J. D. (1976). Distance measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(5), 380–391.
Article MathSciNet Google Scholar
Itakura, F. (1975). Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 23(1), 67–72.
Article Google Scholar
Kennedy, L. S., & Ellis, D. P. W. (2004). Laughter detection in meetings. Paper presented at the NIST International Conference on Acoustics, Speech and Signal Processing (Meeting Recognition Workshop), Montreal, Canada, 17 May.
Knox, M. T., & Mirghafori, N. (2007). Automatic laughter detection using neural networks. Paper presented at the Interspeech, Antwerpen, Belgium, 27–31 Aug.
Knox, M. T., Morgan, N., & Mirghafori, N. (2008). Getting the last laugh: automatic laughter segmentation in meetings. Paper presented at the Interspeech, Brisbane, Australia, 22–26 Sept.
Laskowski, K. (2008). Modeling vocal interaction for text-independent detection of involvement hotspots in multi-party meetings. Paper presented at the 2nd IEEE/ISCA/ACL Workshop on Spoken Language Technology, Goa, India, 15–19 Dec.
Li, A.-J., Yin, Z.-G., Wang, M.-L., Xu, B., & Zong, C. Q. (2002). Spontaneous conversation corpus CADCC. Beijing: Phonetics Lab., Institute of Linguistics Chinese Academy of Social Sciences.
Google Scholar
Li, Y. X., He, Q. H., Chen, N., & Qi, Z. H. (2008). Spectral stability feature based novel method for discriminating speech and laughter. Journal of Electronics & Information Technology, 30(6), 1359–1362.
Article Google Scholar
Li, Y. X., He, Q. H., Kwong, S., Li, T., & Yang, J. C. (2009). Characteristics-based effective applause detection for meeting speech. Signal Processing, 89(8), 1625–1633.
Article MATH Google Scholar
Makhoul, J., Viswanathan, R., Cosell, L., & Russell, W. (1974). Natural communication with computers: speech compression research at BBN (No. 2976). Cambridge: Bolt Beranek and Newman.
Google Scholar
O’Shaughnessy, D. (2008). Automatic speech recognition: history, methods and challenges. Pattern Recognition, 41(10), 2965–2979.
Article MATH Google Scholar
Petridis, S., & Pantic, M. (2008). Audiovisual discrimination between laughter and speech. Paper presented at the IEEE ICASSP 2008, Las Vegas, Nevada, USA, 30 Mar.–4 Apr.
Provine, R. R. (1993). Laughter punctuates speech: linguistic, social and gender contexts of laughter. Ethology, 92, 291–298.
Google Scholar
Provine, R. R. (1996). Laughter. American Scientist, 84(1), 38–47.
Google Scholar
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.
Article Google Scholar
Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall.
Google Scholar
Schuller, B., Eyben, F., & Rigoll, G. (2008). Static and dynamic modeling for the recognition of non-verbal vocalisations in conversational speech. In Perception in multimodal dialogue systems (Vol. 5078, pp. 99–110). Heidelberg: Springer.
Chapter Google Scholar
Temko, A., Macho, D., & Nadeu, C. (2008). Fuzzy integral based information fusion for classification of highly confusable non-speech sounds. Pattern Recognition, 41(5), 1814–1823.
Article MATH Google Scholar
Truong, K. P., & van Leeuwen, D. A. (2007). Automatic discrimination between laughter and speech. Speech Communication, 49(2), 144–158.
Article Google Scholar
Wakita, H. (1976). Residual energy of linear prediction applied to vowel and speaker recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(3), 270–271.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic and Information Engineering, South China University of Technology, No. 381 Wushan Road, Guangzhou, 510640, Guangdong Province, China
Yan-Xiong Li & Qian-Hua He

Authors

Yan-Xiong Li
View author publications
You can also search for this author in PubMed Google Scholar
Qian-Hua He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan-Xiong Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, YX., He, QH. Detecting laughter in spontaneous speech by constructing laughter bouts. Int J Speech Technol 14, 211–225 (2011). https://doi.org/10.1007/s10772-011-9097-1

Download citation

Received: 13 March 2011
Accepted: 14 June 2011
Published: 14 July 2011
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10772-011-9097-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting laughter in spontaneous speech by constructing laughter bouts

Abstract

Access this article

Similar content being viewed by others

Differentiating Laughter Types via HMM/DNN and Probabilistic Sampling

A novel voice activity detection algorithm using modified global thresholding

Comparative Study of Singing Voice Detection Methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting laughter in spontaneous speech by constructing laughter bouts

Abstract

Access this article

Similar content being viewed by others

Differentiating Laughter Types via HMM/DNN and Probabilistic Sampling

A novel voice activity detection algorithm using modified global thresholding

Comparative Study of Singing Voice Detection Methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation