Skip to main content

An Automatic Shout Detection System Using Speech Production Features

  • Conference paper
  • First Online:
Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction (MA3HMI 2014)

Abstract

Automatic detection of shout in continuous speech is a challenging task. In our recent study, the characteristics of shout and normal speech signals are examined along with the electroglottograph (EGG) signals. The study highlights the changes in the characteristics of both the excitation source and the vocal tract system during production of shout, from those of normal speech. In this paper, we aim to develop an automatic system to detect regions of shout in continuous speech, based upon changes in the production characteristics of shouted speech. Discriminating production features like instantaneous fundamental frequency, strength of excitation, dominant frequency and spectral band energy ratio are extracted from the speech signal. Parameters are derived for the shout decision capturing average level and temporal changes in the features and their pairwise mutual relations. A speaker and language independent prototype automatic shout detection system is developed. Performance evaluation over four databases gave encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nanjo, H., Nishiura, T., Kawano, H.: Acoustic-based security system: towards robust understanding of emergency shout. In: Proceedings of the Fifth International Conference on Information Assurance and Security, 2009 (IAS 2009), August 2009, vol. 1, pp. 725–728 (2009)

    Google Scholar 

  2. Huang, W., Chiew, T.K., Li, H., Kok, T.S., Biswas, J.: Scream detection for home applications. In: Proceedings of the 5th IEEE Conference on Industrial Electronics and Applications, 2010 (ICIEA 2010), June 2010, pp. 2115–2120 (2010)

    Google Scholar 

  3. Rouas, J.L., Louradour, J., Ambellouis, S.: Audio events detection in public transport vehicle. In: Proceedings of the IEEE Intelligent Transportation Systems Conference, 2006 (ITSC 2006), September 2006, 733–738 (2006)

    Google Scholar 

  4. Van Hengel, P.W.J., Andringa, T.C.: Verbal aggression detection in complex social environments. In Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2007 (AVSS 2007), September 2007, pp. 15–20 (2007)

    Google Scholar 

  5. Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., Sarti, A.: Scream and gunshot detection and localization for audio-surveillance systems. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2007 (AVSS 2007), September 2007, pp. 21–26 (2007)

    Google Scholar 

  6. Pohjalainen, J., Alku, P., Kinnunen, T.: Shout detection in noise. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2011 (ICASSP 2011), May 2011, pp. 4968–4971 (2011)

    Google Scholar 

  7. Zelinka, P., Sigmund, M., Schimmel, J.: Impact of vocal effort variability on automatic speech recognition. Speech Commun. 54(6), 732–742 (2012)

    Article  Google Scholar 

  8. Pohjalainen, J., Raitio, T., Yrttiaho, S., Alku, P.: Detection of shouted speech in noise: human and machine. J. Acoust. Soc. Am. 133(4), 2377–2389 (2013)

    Article  Google Scholar 

  9. Mittal, V.K., Yegnanarayana, B.: Effect of glottal dynamics in the production of shouted speech. J. Acoust. Soc. Am. 133(5), 3050–3061 (2013)

    Article  Google Scholar 

  10. Mittal, V.K., Yegnanarayana, B.: Production features for detection of shouted speech. In: Proceedings of the 10th IEEE CCNC 2013, USA, 11–14 January 2013, pp. 106–111 (2013)

    Google Scholar 

  11. Fant, G., Lin, Q., Gobl, C.: Notes on glottal flow interaction. STL-QPSR, KTH, Sweden 26(2–3), 21–45 (1985)

    Google Scholar 

  12. Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  13. Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)

    Article  Google Scholar 

  14. Yegnanarayana, B., Murty, K.S.R.: Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)

    Article  Google Scholar 

  15. Gangamohan, P., Kadiri, S.R., Yegnanarayana, B.: Analysis of emotional speech at subsegmental level. In: Proceedings of International Conference on Spoken Language Processing (INTERSPEECH), Lyon, France, 25-29 August 2013, pp. 1916–1920 (2013)

    Google Scholar 

  16. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of International Conference on Spoken Language Processing (INTERSPEECH), pp. 1517–1520. ISCA, Lisbon, Portugal, 4–8 September 2005

    Google Scholar 

  17. Zhang, C., Hansen, J.H.L.: Analysis and classification of speech mode: whispered through shouted, In: Proceedings of International Conference on Spoken Language Processing (INTERSPEECH), pp. 2289–2292. ISCA, Antwerp, Belgium (2007)

    Google Scholar 

Download references

Acknowledgement

This work is partially supported by research collaboration between Speech Vision Laboratory, IIIT, Hyderabad and SAIT, SRI, Bangalore (2010-2013).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vinay Kumar Mittal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mittal, V.K., Yegnanarayana, B. (2015). An Automatic Shout Detection System Using Speech Production Features. In: Böck, R., Bonin, F., Campbell, N., Poppe, R. (eds) Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. MA3HMI 2014. Lecture Notes in Computer Science(), vol 8757. Springer, Cham. https://doi.org/10.1007/978-3-319-15557-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15557-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15556-2

  • Online ISBN: 978-3-319-15557-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics