Skip to main content
Log in

Detection of complex video events through visual rhythm

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

The recognition of complex events in videos has currently several important applications, particularly due to the wide availability of digital cameras in environments such as airports, train and bus stations, shopping centers, stadiums, hospitals, schools, buildings, roads, among others. Advances in digital technology have enhanced the capabilities for detection of video events through the development of devices with high resolution, small physical size, and high sampling rates. This work presents and evaluates the use of feature descriptors extracted from visual rhythms of video sequences in three computer vision problems: abnormal event detection, human action classification, and gesture recognition. Experiments conducted on well-known public datasets demonstrate that the method produces promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. Although some efforts have been made to differentiate among these terms, in this work, abnormal events will be assumed to similar to unusual, rare, suspicious, anomalous, irregular, outlying or atypical events.

  2. Normalcy or normality is the state of being normal or usual.

  3. An object that fits the principle of distance conservation [54].

  4. Many works from the literature consider the first k frames for training, where k varies from 200 to 300.

References

  1. Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16 (2011)

    Article  Google Scholar 

  2. Alcantara, M., Moreira, T., Pedrini, H.: Real-time action recognition based on cumulative motion shapes. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2917–2921 (2014)

  3. Alcantara, M., Moreira, T., Pedrini, H.: Real-time action recognition using a multilayer descriptor with variable size. J. Electron. Imaging 25(1), 013,020–013,020 (2016)

    Article  Google Scholar 

  4. Almotairi, S.M.: Using variations of shape and appearance in alignment methods for classifying human actions. Florida Institute of Technology, Melbourne (2014)

  5. Antonucci, A., De Rosa, R., Giusti, A., Cuzzolin, F.: Robust classification of multivariate time series by imprecise hidden Markov models. Int. J. Approx. Reason. 56, 249–263 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  6. Berent, J., Dragotti, P.: Segmentation of epipolar-plane image volumes with occlusion and disocclusion competition. In: IEEE 8th Workshop on Multimedia Signal Processing, pp. 182–185 (2006)

  7. Biswas, S., Babu, R.V.: Real time anomaly detection in H.264 compressed videos. In: Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, pp. 1–4. IEEE (2013)

  8. Blackburn, J., Ribeiro, E.: Human motion recognition using isomap and dynamic time warping. In: Human Motion: Understanding, Modeling, Capture and Animation, pp. 285–298. Springer (2007)

  9. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: International Conference on Computer Vision, Beijing, pp. 1395–1402 (2005)

  10. Bolles, R.C., Baker, H.H.: Epipolar-plane image analysis: a technique for analyzing motion sequences. In: 3th IEEE Workshop on Computer Vision, Representation, and Control, pp. 168–178. IEEE (1985)

  11. Boughorbel, S., Tarel, J.P., Boujemaa, N.: Generalized histogram intersection kernel for image recognition. In: IEEE International Conference on Image Processing, vol. 3, pp. III–161. IEEE (2005)

  12. Bourke, A., O’brien, J., Lyons, G.: Evaluation of a threshold-based tri-axial accelerometer fall detection algorithm. Gait Posture 26(2), 194–199 (2007)

    Article  Google Scholar 

  13. Buch, N., Velastin, S., Orwell, J.: A review of computer vision techniques for the analysis of urban traffic. IEEE Trans. Intell. Transp. Syst. 12(3), 920–939 (2011)

    Article  Google Scholar 

  14. Calderara, S., Heinemann, U., Prati, A., Cucchiara, R., Tishby, N.: Detecting anomalies in people’s trajectories using spectral graph analysis. Comput. Vis. Image Underst. 115(8), 1099–1111 (2011)

    Article  Google Scholar 

  15. Candamo, J., Shreve, M., Goldgof, D., Sapper, D., Kasturi, R.: Understanding transit scenes: a survey on human behavior-recognition algorithms. IEEE Trans. Intell. Transp. Syst. 11(1), 206–224 (2010)

    Article  Google Scholar 

  16. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)

    Article  Google Scholar 

  17. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)

    Article  Google Scholar 

  18. Chen, D.Y., Huang, P.C.: Motion-based unusual event detection in human crowds. J. Vis. Commun. Image Represent. 22(2), 178–186 (2011)

    Article  MathSciNet  Google Scholar 

  19. Cong, Y., Yuan, J., Liu, J.: Abnormal event detection in crowded scenes using sparse representation. Pattern Recognit. 46(7), 1851–1864 (2013)

    Article  Google Scholar 

  20. Cui, L., Li, K., Chen, J., Li, Z.: Abnormal event detection in traffic video surveillance based on local features. In: 4th International Congress on Image and Signal Processing, vol. 1, pp. 362–366. IEEE (2011)

  21. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)

  22. De Rosa, R., Cesa-Bianchi, N., Gori, I., Cuzzolin, F.: Online action recognition via nonparametric incremental learning. In: British Machine Vision Conference. BMVA Press (2014)

  23. Dee, H.M., Velastin, S.A.: How close are we to solving the problem of automated visual surveillance? Mach. Vis. Appl. 19(5–6), 329–343 (2007)

    Google Scholar 

  24. Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: Space-time pose representation for 3D human action recognition. In: International Conference on Image Analysis and Processing, pp. 456–464. Springer (2013)

  25. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE (2005)

  26. Duda, R.O., Hart, P.E.: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 15(1), 11–15 (1972)

    Article  MATH  Google Scholar 

  27. Fanello, S.R., Gori, I., Metta, G., Odone, F.: Keep it Simple and Sparse: Real-Time Action Recognition. Journal of Machine Learning Research 14(1), 2617–2640 (2013)

    Google Scholar 

  28. Farneback, G.: Two-frame motion estimation based on polynomial expansion. In: Image Analysis, pp. 363–370. Springer (2003)

  29. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

  30. Fawzy, F., Abdelwahab, M., Mikhael, W.: 2DHOOF-2DPCA contour based optical flow algorithm for human activity recognition. In: IEEE 56th International Midwest Symposium on Circuits and Systems, pp. 1310–1313 (2013)

  31. Feng, J., Zhang, C., Hao, P.: Online learning with self-organizing maps for anomaly detection in crowd scenes. In: 20th International Conference on Pattern Recognition, vol. 1, pp. 3599–3602 (2010)

  32. Fortun, D., Bouthemy, P., Kervrann, C.: Optical flow modeling and computation: a survey. Comput. Vis. Image Underst. 134, 1–21 (2015)

    Article  MATH  Google Scholar 

  33. Gong, S., Xiang, T.: Person re-identification. In: Visual Analysis of Behaviour, pp. 301–313. Springer (2011)

  34. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)

    Article  Google Scholar 

  35. Guimarães, S., de A.-Araújo, A., Couprie, M., Leite, N.: An Approach to detect video transitions based on mathematical morphology. In: International Conference on Image Processing, vol. 3, pp. II–1021–4 (2003)

  36. Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22(6), 2479–2494 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  37. Horn, B.K., Schunck, B.G.: Determining optical flow. In: 1981 Technical Symposium East, pp. 319–331. International Society for Optics and Photonics (1981)

  38. Hung, T.Y., Lu, J., Tan, Y.P.: Cross-scene abnormal event detection. In: IEEE International Symposium on Circuits and Systems, pp. 2844–2847 (2013)

  39. Ji, S., Xu, W., Yang, M., Yu, K.: 3D Convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  40. Jiang, F., Yuan, J., Tsaftaris, S.A., Katsaggelos, A.K.: Anomalous video event detection using spatiotemporal context. Comput. Vis. Image Underst. 115(3), 323–333 (2011)

    Article  Google Scholar 

  41. Jiang, X., Zhong, F., Peng, Q., Qin, X.: Online robust action recognition based on a hierarchical model. Vis. Comput. 30(9), 1021–1033 (2014)

    Article  Google Scholar 

  42. Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: Tenth IEEE International Conference on Computer Vision, vol. 1, pp. 166–173. IEEE (2005)

  43. Kliper-Gross, O., Hassner, T., Wolf, L.: The action similarity labeling challenge. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 615–621 (2012)

    Article  Google Scholar 

  44. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  45. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

  46. Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2014)

    Article  Google Scholar 

  47. Liu, L., Shao, L.: Learning Discriminative representations from RGB-D video data. In: International Joint Conference on Artificial Intelligence, vol. 1, p. 3 (2013)

  48. Lowe, D.: Object recognition from local scale-invariant features. In: Computer Vision, vol. 2, pp. 1150–1157 (1999)

  49. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. Int. Jt. Conf. Artif. Intell. 81, 674–679 (1981)

    Google Scholar 

  50. Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1975–1981 (2010)

  51. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: 26th Annual International Conference on Machine Learning, pp. 689–696. ACM (2009)

  52. McCahill, M., Norris, C.: CCTV systems in London: their structures and practices. Tech. rep., Centre for Criminology and Criminal Justice, University of Hull (2003)

  53. Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 935–942 (2009)

  54. Mitiche, A.: Rigid body kinematics: some basic notions. In: Computational Analysis of Visual Motion. Advances in Computer Vision and Machine Intelligence, pp. 31–43. Springer, US (1994)

  55. Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online Detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)

  56. Moreira, T., Alcantara, M., Pedrini, H., Menotti, D.: Fast and accurate gesture recognition based on motion shapes. In: Iberoamerican Congress on Pattern Recognition, pp. 247–254. Springer (2015)

  57. Nalwa, V.S.: A Guided Tour of Computer Vision. Addison-Wesley Longman Publishing Co. Inc., Boston (1993)

    Google Scholar 

  58. Nam, Y.: Crowd flux analysis and abnormal event detection in unstructured and structured scenes. Multimed. Tools Appl. 72(3), 3001–3029 (2014)

    Article  Google Scholar 

  59. Ngo, C., Pong, T., Chin, R.: Detection of gradual transitions through temporal slice analysis. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1 (1999)

  60. Ngo, C.W., Pong, T.C., Zhang, H.J.: Motion analysis and segmentation through spatio-temporal slices processing. IEEE Trans. Image Process. 12(3), 341–355 (2003)

    Article  Google Scholar 

  61. Niebles, J.C., Fei-Fei, L.: A hierarchical model of shape and appearance for human action classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)

  62. Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)

    Article  Google Scholar 

  63. Nishida, N., Nakayama, H.: Multimodal gesture recognition using multi-stream recurrent neural network. In: Pacific-Rim Symposium on Image and Video Technology, pp. 682–694. Springer (2015)

  64. Niyogi, S., Adelson, E.: Analyzing and recognizing walking figures in XYT. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 469–474 (1994)

  65. Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)

    Google Scholar 

  66. Ozturk, O., Yamasaki, T., Aizawa, K.: Detecting dominant motion flows in unstructured/structured crowd scenes. In: 20th International Conference on Pattern Recognition, pp. 3533–3536 (2010)

  67. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  68. Piciarelli, C., Foresti, G.: On-line trajectory clustering for anomalous events detection. Pattern Recognit. Lett. 27(15), 1835–1842 (2006)

    Article  Google Scholar 

  69. Raja, K., Laptev, I., Pérez, P., Oisel, L.: Joint pose estimation and action recognition in image graphs. In: 18th IEEE International Conference on Image Processing, pp. 25–28. IEEE (2011)

  70. Ran, Y.: Symmetry in Human motion analysis: theory and experiment. Ph.D. thesis, University of Maryland (2006)

  71. Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)

    Article  Google Scholar 

  72. da S. Pinto, A., Pedrini, H., Schwartz, W., Rocha, A.: Video-based face spoofing detection through visual rhythm analysis. In: 25th SIBGRAPI Conference on Graphics, Patterns and Images, pp. 221–228 (2012)

  73. Saligrama, V.: Video anomaly detection based on local statistical aggregates. In: IEEE Conference on Computer Vision and Pattern Recognition pp. 2112–2119 (2012)

  74. Schindler, K., van Gool, L.: Action snippets: how many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  75. Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J., Platt, J.C.: Support vector method for novelty detection. NIPS 12, 582–588 (1999)

    Google Scholar 

  76. Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 17th International Conference on Pattern Recognition, vol. 3, pp. 32–36. IEEE (2004)

  77. Prison Service Order.: Display screen equipment health and safety issues. H.M. Prison Service (2000)

  78. Sobral, A., Vacavant, A.: A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Comput. Vis. Image Underst. 122, 4–21 (2014)

    Article  Google Scholar 

  79. Sun, X., Chen, M., Hauptmann, A.: Action Recognition via local descriptors and holistic features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 58–65. IEEE (2009)

  80. Suriani, N.S., Hussain, A., Zulkifley, M.A.: Sudden event recognition: a survey. Sensors 13(8), 9966–9998 (2013)

    Article  Google Scholar 

  81. Tang, X., Zhang, S., Yao, H.: Sparse Coding based motion attention for abnormal event detection. In: 20th IEEE International Conference on Image Processing, pp. 3602–3606 (2013)

  82. Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)

    Article  MATH  Google Scholar 

  83. Thida, M., Eng, H.L., Remagnino, P.: Laplacian eigenmap with temporal constraints for local abnormality detection in crowded scenes. IEEE Trans. Cybern. 43(6), 2147–2156 (2013)

  84. Tung, P.T., Ngoc, L.Q.: Elliptical density shape model for hand gesture recognition. In: Fifth Symposium on Information and Communication Technology, pp. 186–191. ACM (2014)

  85. UMN—Detection of Unusual Crowd Dataset (2015) http://mha.cs.umn.edu/

  86. Valio, F.B., Pedrini, H., Leite, N.J.: Fast rotation-invariant video caption detection based on visual rhythm. In: San Martin, C., Kim, S.W. (eds.) Progress in pattern recognition, image analysis, computer vision, and applications, Lecture notes in computer science, pp. 157–164. Springer, Berlin, Heidelberg (2011)

    Chapter  Google Scholar 

  87. Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. Vis. Comput. 29(10), 983–1009 (2013)

    Article  Google Scholar 

  88. Vo, D.H., Huynh, H.H., Meaunier, J.: Geometry-based dynamic hand gesture recognition. J. Sci. Technol. 1, 13–19 (2015)

    Google Scholar 

  89. Wallace, E., Diffley, C., Britain, G.: CCTV: Making it work: CCTV control room ergonomics. Publication (Great Britain. Home Office. Police Scientific Development Branch). Police Scientific Development Branch (1998)

  90. van der Walt, S., Colbert, S.C., Varoquaux, G.: The numpy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2), 22–30 (2011)

    Article  Google Scholar 

  91. van der Walt, S., Schönberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D., Yager, N., Gouillart, E., Yu, T.: The scikit-image contributors: scikit-image: image processing in Python. PeerJ 2, e453 (2014)

    Article  Google Scholar 

  92. Wang, S., Huang, K., Tan, T.: A Compact optical flow based motion representation for real-time action recognition in surveillance scenes. In: 16th IEEE International Conference on Image Processing, pp. 1121–1124 (2009)

  93. Wang, T., Chen, J., Snoussi, H.: Online detection of abnormal events in video streams. J. Electr. Comput. Eng. 2013, 1–12 (2013)

  94. Wong, S.F., Cipolla, R.: Extracting spatiotemporal interest points using global information. In: IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)

  95. Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)

  96. Yang, W., Wang, Y., Mori, G.: Human action recognition from a single clip per action. In: IEEE 12th International Conference on Computer Vision Workshops, pp. 482–489 (2009)

  97. Yu, M., Liu, L., Shao, L.: Structure-preserving binary representations for RGB-D action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1651–1664 (2016)

    Article  Google Scholar 

  98. Zack, G., Rogers, W., Latt, S.: Automatic measurement of sister chromatid exchange frequency. J. Histochem. Cytochem. 25(7), 741–753 (1977)

    Article  Google Scholar 

  99. Zhang, Y., Qin, L., Yao, H., Huang, Q.: Abnormal crowd behavior detection based on social attribute-aware force model. In: 19th IEEE International Conference on Image Processing, pp. 2689–2692 (2012)

  100. Zhao, B., Fei-Fei, L., Xing, E.P.: Online detection of unusual events in videos via dynamic sparse coding. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3313–3320. IEEE Computer Society (2011)

Download references

Acknowledgments

The authors are grateful to FAPESP, CNPq and CAPES for the financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Helio Pedrini.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Torres, B.S., Pedrini, H. Detection of complex video events through visual rhythm. Vis Comput 34, 145–165 (2018). https://doi.org/10.1007/s00371-016-1321-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-016-1321-1

Keywords

Navigation