Skip to main content

Advertisement

Log in

Recognition of emergency situations using audio–visual perception sensor network for ambient assistive living

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

In this paper, we present a perception sensor network (PSN) capable of detecting audio- and visual-based emergency situations such as students’ quarrel with scream and punch, and of keeping an effective school safety. As a system aspect, PSN is basically composed of ambient type sensor units using a Kinect, a pan-tilt-zoom camera, and a control board to acquire raw audio signals, color and depth images. Audio signals, which are acquired by the Kinect microphone array, are used in recognizing sound classes and localizing that sound source. Vision signals, which are acquired by the Kinect and PTZ camera stream, are used to detect the location of humans, identify their name and recognize their gestures. In the system, fusion methods are utilized to associate with multiple person detection and tracking, face identification, and audio–visual emergency recognition. Two approaches of matching pursuit algorithm and dense trajectories covariance matrix are also applied for reliably recognizing abnormal activities of students. Through this, human-caused emergencies are detected automatically while identifying human data of occurrence place, subject, and emergency type. Our PSN that consists of four units was used to conduct experiments to detect the designated target with abnormal actions in multi-person scenarios. By evaluating the performance of perception capabilities and integrated system, it was confirmed that the proposed system can help to conduct more meaningful information which can be of substantive support to teachers or staff members in school environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. School Violence, http://www.cdc.gov/ViolencePrevention/youthviolence/schoolviolence.

  2. Korean National Police Agency, https://www.police.go.kr.

  3. Korean School Information, http://www.schoolinfo.go.kr.

  4. BEHAVE dataset, http://homepages.inf.ed.ac.uk/rbf/BEHAVE.

  5. D-CASE challenge, http://www.cs.tut.fi/sgn/arg/dcase2017/challenge/index.

  6. Robot Operating System, https://wiki.ros.org/indigo.

  7. OpenNI package, https://wiki.ros.org/openni_tracker.

  8. The boundaries of an object on an image.

  9. https://www.sound-ideas.com/.

  10. https://www.freesound.org/.

References

  • An K, Lee G, Yun SS, Choi J (2015) Multiple humans recognition of robot aided by perception sensor network. In: Proc Int Conf Ubiquitous Robots and Ambient Intelligence—URAI’15 pp 359–361

  • Blunsden S, Andrade E, Fisher R (2007) Non parametric classification of human interaction. In: Proc 3rd Iberian Conf. Pattern Recog. Image Anal, pp 347–354

  • Candamo J, Shreve M, Goldgof DB, Sapper DB, Kasturi R (2010) Understanding transit scenes: a survey on human behavior-recognition algorithm. IEEE Trans Intell Trans Sys 11(1):206–224

    Article  Google Scholar 

  • Chen C-C, Yao Y, Drira A et al (2009) Cooperative mapping of multiple PTZ cameras in automated surveillance systems. In: Proc IEEE Int conf computer vision and pattern recog—CVPR’09, pp 1078–1084

  • Chu S, Narayanan S, Kuo C-CJ (2009) Environmental sound recognition with time frequency audio features. IEEE Trans Audio Speech Lang Process 17:1142–1158. doi:10.1109/TASL.2009.2017438

    Article  Google Scholar 

  • Cook DJ, Augusto JC, Jakkula VR (2009) Ambient intelligence: technologies, applications, and opportunities. Pervasive Mob Comput 5:277–298. doi:10.1016/j.pmcj.2009.04.001

    Article  Google Scholar 

  • Cui X, Liu Q, Gao M, Metaxas DN (2011) Abnormal detection using interaction energy potentials. In: Proc IEEE Int Conf Computer Vision and Pattern Recog—CVPR’11, pp 3161–3167

  • Datta A, Shah M, Lobo NDV (2002) Person-on-person violence detection in video data. Object Recognit Supp User Interact Serv Robot 1:433–438. doi:10.1109/ICPR.2002.1044748

    Article  Google Scholar 

  • Demarty CH, Penet C, Soleymani M, Gravier G (2015) VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74:7379–7404. doi:10.1007/s11042-014-1984-4

    Article  Google Scholar 

  • Fernandez-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181

    MathSciNet  MATH  Google Scholar 

  • Gan T, Wong Y, Zhang D, Kankanhalli MS (2013) Temporal encoded F-formation system for social interaction detection. In: Proc 21st ACM Int Conf Multimed—MM’13, pp 937–946. doi:10.1145/2502081.2502096

  • Geen R (1990) Human aggression, 2nd edn. Open University Press, Buckingham

    Google Scholar 

  • Huang W, Chiew TK, Li H, Kok TS, Biswas J (2010) Scream detection for home applications. In: Proc 5th IEEE Conf Indus Elec Appl, pp 2115–2120

  • Huang J, Xiao S, Zhou Q, Guo F, You X, Li H, Li B (2015) A robust feature extraction algorithm for the classification of acoustic targets in wild environments. Circ Syst Signal Process 34:1–12

    Article  Google Scholar 

  • Juang L-H, Wu M-N (2015) Fall down detection under smart home system. J Med Syst 39:1–12

    Article  Google Scholar 

  • Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33:172–185. doi:10.1109/TPAMI.2010.68

    Article  Google Scholar 

  • Kang B, Kim D (2013) Face Identification using affine simulated dense local descriptors. In: Proc Int Conf Ubiquitous Robots and Ambient Intelligence—URAI’13, pp 346–351

  • Kiktova E, Juhar J, Cizmar A (2015) Feature selection for acoustic events detection. Multimed Tools Appl 74(12):4213–4233

    Article  Google Scholar 

  • Kim YJ, Cho NG, Lee SW (2014) Group activity recognition with group interaction zone. In: Proc 22nd Int Conf Pat Recog—ICPR’14, pp 3517–3521

  • Kooij JFP, Liem MC, Krijnders JD et al (2016) Multi-modal human aggression detection. Comput Vis Image Underst 144:106–120. doi:10.1016/j.cviu.2015.06.009

    Article  Google Scholar 

  • Kotus J, Łopatka K, Czyżewski A, Bogdanis G (2016) Processing of acoustical data in a multimodal bank operating room surveillance system. Multimed Tools Appl 75:10787–10805. doi:10.1007/S11042-014-2264-Z

    Article  Google Scholar 

  • Krstulovic S, Gribonval R (2006) Mptk: matching pursuit made tractable. In: Proc 2006 IEEE Int Conf Acoustics Speed and Signal Process III-496–III-499

  • Lee Y, Han DK, Ko H (2013) Acoustic signal based abnormal event detection in indoor environment using multiclass adaboost. IEEE Trans Consum Electron 59:615–622. doi:10.1109/TCE.2013.6626247

    Article  Google Scholar 

  • Lei B, Mak M-W (2014) Sound-event partitioning and feature normalization for robust sound-event detection. Proc 19th Int Conf Digit Signal Process. doi:10.1109/ICDSP.2014.6900692

    Google Scholar 

  • Li Y, Ho KC, Popescu M (2014) Efficient source separation algorithms for acoustic fall detection using a microsoft kinect. IEEE Trans Biomed Eng 61:745–755. doi:10.1109/TBME.2013.2288783

    Article  Google Scholar 

  • Lu Y, Payandeh S (2009) Intelligent cooperative tracking in multi-camera systems. In: Proc Ninth Int Conf Intel Syst Design Appl, pp 608–613

  • Madabhushi A, Aggarwal JK (1999) A Bayesian approach to human activity recognition. In: Proc 1999 IEEE Workshop Visual Surveillance, pp 25–32

  • Mastorakis G, Makris D (2014) Fall detection system using Kinect’s infrared sensor. J Real Time Image Process 9:635–646. doi:10.1007/s11554-012-0246-9

    Article  Google Scholar 

  • Mubashir M, Shao L, Seed L (2013) A survey on fall detection: Principles and approaches. Neurocomputing 100:144–152. doi:10.1016/j.neucom.2011.09.037

    Article  Google Scholar 

  • Nakadai K, Takahashi T, Okuno HG et al (2010) Design and implementation of robot audition system “HARK”—open source software for listening to three simultaneous speakers. Adv Robot 24:739–761. doi:10.1163/016918610X493561

    Article  Google Scholar 

  • Nguyen Q, Choi J (2015) Selection of the closest sound source for robot auditory attention in multi-source scenarios. J Intell Robot Syst Theory Appl 1–13. doi:10.1007/s10846-015-0313-0

  • Nguyen Q, Choi J (2017) Matching pursuit based robust acoustic event classification for surveillance system. Comp Elec Engr 57(1):43–54

    Article  Google Scholar 

  • Nguyen Q, Yun S, Choi J (2014) Audio–visual integration for human-robot interaction in multi-person scenarios. In: Proc IEEE Emer Tech Fac Autom—ETFA’14, pp 1–4

  • Pedregosa F, Varoquaux G, Gramfort A et al (2012) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830. doi:10.1007/s13398-014-0173-7.2

    MathSciNet  MATH  Google Scholar 

  • Piczak K (2015) Environmental sound classification with convolutional neural networks. In: Proc IEEE 25th Int Workshop Mach Learn Sig Process—MLSP’15, pp 1–6

  • Popoola OP, Wang K (2012) Video-based abnormal human behavior recognition—a review. Syst Man Cybern Part C Appl Rev IEEE Trans 42:865–878. doi:10.1109/TSMCC.2011.2178594

    Article  Google Scholar 

  • Richardson D, Green L (2006) Direct and indirect aggression: Relationships as social context. J Appl Soc Psychol 36(10):2492–2508

    Article  Google Scholar 

  • Robers S, Zhang A, Morgan RE, Musu-Gillette L (2015) Indicators of school crime and safety: 2014 (No. NCES 2015-072/NCJ 248036). US Department of Education, Washington, DC

    Google Scholar 

  • Schädler M, Meyer B, Kollmeier B (2012) Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. J Acoust Soc Am 131(5):4134–4151

    Article  Google Scholar 

  • Schwarz LA, Mkhitaryan A, Mateus D, Navab N (2012) Human skeleton tracking from depth data using geodesic distances and optical flow. Image Vis Comput 30:217–226. doi:10.1016/j.imavis.2011.12.001

    Article  Google Scholar 

  • Shen G, Nguyen Q, Choi JS (2012) An environmental sound source classification system based on mel-frequency cepstral coefficients and gaussian mixture models. IFAC Proc 45(6):1802–1807

    Article  Google Scholar 

  • Song B, Ding C, Kamal AT et al (2011) Distributed camera networks. Signal Process Mag IEEE 28:20–31. doi:10.1109/MSP.2011.940441

    Article  Google Scholar 

  • Stowell D, Plumbley M (2014) Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ 2:e488

    Article  Google Scholar 

  • Valenzise G, Gerosa L, Tagliasacchi M, Antonacci F, Sarti A (2007) Scream and gunshot detection and localization for audio-surveillance systems. In: Proc IEEE Int Conf Adv Video Signal based Surveillance—AVSS’07, pp 21–26

  • Wang JC, Lin CH, Chen BW, Tsai M-K (2009) Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation. Proc 5th IEEE Int Work Vis Softw Underst Anal 25:27–27. doi:10.1109/VISSOF.2009.5336427

    Google Scholar 

  • Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79. doi:10.1007/s11263-012-0594-8

    Article  MathSciNet  Google Scholar 

  • Yasin H, Khan SA (2008) Moment invariants based human mistrustful and suspicious motion detection, recognition and classification. In: Proc Comput Modeling Simul, pp 734–739

  • Yun SS, Choi J (2017) A remote management for school emergency situations using perception sensor network and interactive robots. In: Proc IEEE Int Conf Human-Robot Inter, pp 333–334

Download references

Acknowledgements

We give special thanks to Dr. Seong-Whan Lee and Nam-Gyu Cho, staff members at Korea University, for their technical cooperation and assistance in the integrated experiments and insightful discussions. The research was supported partly by the ‘Implementation of Technologies for Identification, Behavior, and Location of Human based on Sensor Network Fusion’ Program and ‘Development of Social Robot Intelligence for Social Human-Robot Interaction of Service Robots’ program through the Ministry of Trade, Industry, and Energy (Grant Number: 10041629 [SimonPiC] and 10077468 [DeepTasK]) and partly by ICT R&D programs of IITP (2015-0-00197 [LISTEN] and 2017-0-00432 [BCI]).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to JongSuk Choi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yun, SS., Nguyen, Q. & Choi, J. Recognition of emergency situations using audio–visual perception sensor network for ambient assistive living. J Ambient Intell Human Comput 10, 41–55 (2019). https://doi.org/10.1007/s12652-017-0597-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-017-0597-y

Keywords

Navigation