An Automatic Shout Detection System Using Speech Production Features

Mittal, Vinay Kumar; Yegnanarayana, Bayya

doi:10.1007/978-3-319-15557-9_9

Vinay Kumar Mittal⁸ &
Bayya Yegnanarayana⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8757))

Included in the following conference series:

International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction

855 Accesses
1 Citations

Abstract

Automatic detection of shout in continuous speech is a challenging task. In our recent study, the characteristics of shout and normal speech signals are examined along with the electroglottograph (EGG) signals. The study highlights the changes in the characteristics of both the excitation source and the vocal tract system during production of shout, from those of normal speech. In this paper, we aim to develop an automatic system to detect regions of shout in continuous speech, based upon changes in the production characteristics of shouted speech. Discriminating production features like instantaneous fundamental frequency, strength of excitation, dominant frequency and spectral band energy ratio are extracted from the speech signal. Parameters are derived for the shout decision capturing average level and temporal changes in the features and their pairwise mutual relations. A speaker and language independent prototype automatic shout detection system is developed. Performance evaluation over four databases gave encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nanjo, H., Nishiura, T., Kawano, H.: Acoustic-based security system: towards robust understanding of emergency shout. In: Proceedings of the Fifth International Conference on Information Assurance and Security, 2009 (IAS 2009), August 2009, vol. 1, pp. 725–728 (2009)
Google Scholar
Huang, W., Chiew, T.K., Li, H., Kok, T.S., Biswas, J.: Scream detection for home applications. In: Proceedings of the 5th IEEE Conference on Industrial Electronics and Applications, 2010 (ICIEA 2010), June 2010, pp. 2115–2120 (2010)
Google Scholar
Rouas, J.L., Louradour, J., Ambellouis, S.: Audio events detection in public transport vehicle. In: Proceedings of the IEEE Intelligent Transportation Systems Conference, 2006 (ITSC 2006), September 2006, 733–738 (2006)
Google Scholar
Van Hengel, P.W.J., Andringa, T.C.: Verbal aggression detection in complex social environments. In Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2007 (AVSS 2007), September 2007, pp. 15–20 (2007)
Google Scholar
Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., Sarti, A.: Scream and gunshot detection and localization for audio-surveillance systems. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2007 (AVSS 2007), September 2007, pp. 21–26 (2007)
Google Scholar
Pohjalainen, J., Alku, P., Kinnunen, T.: Shout detection in noise. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2011 (ICASSP 2011), May 2011, pp. 4968–4971 (2011)
Google Scholar
Zelinka, P., Sigmund, M., Schimmel, J.: Impact of vocal effort variability on automatic speech recognition. Speech Commun. 54(6), 732–742 (2012)
Article Google Scholar
Pohjalainen, J., Raitio, T., Yrttiaho, S., Alku, P.: Detection of shouted speech in noise: human and machine. J. Acoust. Soc. Am. 133(4), 2377–2389 (2013)
Article Google Scholar
Mittal, V.K., Yegnanarayana, B.: Effect of glottal dynamics in the production of shouted speech. J. Acoust. Soc. Am. 133(5), 3050–3061 (2013)
Article Google Scholar
Mittal, V.K., Yegnanarayana, B.: Production features for detection of shouted speech. In: Proceedings of the 10th IEEE CCNC 2013, USA, 11–14 January 2013, pp. 106–111 (2013)
Google Scholar
Fant, G., Lin, Q., Gobl, C.: Notes on glottal flow interaction. STL-QPSR, KTH, Sweden 26(2–3), 21–45 (1985)
Google Scholar
Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
Article Google Scholar
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)
Article Google Scholar
Yegnanarayana, B., Murty, K.S.R.: Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)
Article Google Scholar
Gangamohan, P., Kadiri, S.R., Yegnanarayana, B.: Analysis of emotional speech at subsegmental level. In: Proceedings of International Conference on Spoken Language Processing (INTERSPEECH), Lyon, France, 25-29 August 2013, pp. 1916–1920 (2013)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of International Conference on Spoken Language Processing (INTERSPEECH), pp. 1517–1520. ISCA, Lisbon, Portugal, 4–8 September 2005
Google Scholar
Zhang, C., Hansen, J.H.L.: Analysis and classification of speech mode: whispered through shouted, In: Proceedings of International Conference on Spoken Language Processing (INTERSPEECH), pp. 2289–2292. ISCA, Antwerp, Belgium (2007)
Google Scholar

Download references

Acknowledgement

This work is partially supported by research collaboration between Speech Vision Laboratory, IIIT, Hyderabad and SAIT, SRI, Bangalore (2010-2013).

Author information

Authors and Affiliations

International Institute of Information Technology, Hyderabad, India
Vinay Kumar Mittal & Bayya Yegnanarayana

Authors

Vinay Kumar Mittal
View author publications
You can also search for this author in PubMed Google Scholar
Bayya Yegnanarayana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vinay Kumar Mittal .

Editor information

Editors and Affiliations

Otto von Guericke University, Magdeburg, Germany
Ronald Böck
Trinity College, Dublin, Ireland
Francesca Bonin
Trinity College, Dublin, Ireland
Nick Campbell
Utrecht University, Utrecht, The Netherlands
Ronald Poppe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mittal, V.K., Yegnanarayana, B. (2015). An Automatic Shout Detection System Using Speech Production Features. In: Böck, R., Bonin, F., Campbell, N., Poppe, R. (eds) Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. MA3HMI 2014. Lecture Notes in Computer Science(), vol 8757. Springer, Cham. https://doi.org/10.1007/978-3-319-15557-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-15557-9_9
Published: 12 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15556-2
Online ISBN: 978-3-319-15557-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics