Abstract
The recognition of complex events in videos has currently several important applications, particularly due to the wide availability of digital cameras in environments such as airports, train and bus stations, shopping centers, stadiums, hospitals, schools, buildings, roads, among others. Advances in digital technology have enhanced the capabilities for detection of video events through the development of devices with high resolution, small physical size, and high sampling rates. This work presents and evaluates the use of feature descriptors extracted from visual rhythms of video sequences in three computer vision problems: abnormal event detection, human action classification, and gesture recognition. Experiments conducted on well-known public datasets demonstrate that the method produces promising results.
Similar content being viewed by others
Notes
Although some efforts have been made to differentiate among these terms, in this work, abnormal events will be assumed to similar to unusual, rare, suspicious, anomalous, irregular, outlying or atypical events.
Normalcy or normality is the state of being normal or usual.
An object that fits the principle of distance conservation [54].
Many works from the literature consider the first k frames for training, where k varies from 200 to 300.
References
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16 (2011)
Alcantara, M., Moreira, T., Pedrini, H.: Real-time action recognition based on cumulative motion shapes. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2917–2921 (2014)
Alcantara, M., Moreira, T., Pedrini, H.: Real-time action recognition using a multilayer descriptor with variable size. J. Electron. Imaging 25(1), 013,020–013,020 (2016)
Almotairi, S.M.: Using variations of shape and appearance in alignment methods for classifying human actions. Florida Institute of Technology, Melbourne (2014)
Antonucci, A., De Rosa, R., Giusti, A., Cuzzolin, F.: Robust classification of multivariate time series by imprecise hidden Markov models. Int. J. Approx. Reason. 56, 249–263 (2015)
Berent, J., Dragotti, P.: Segmentation of epipolar-plane image volumes with occlusion and disocclusion competition. In: IEEE 8th Workshop on Multimedia Signal Processing, pp. 182–185 (2006)
Biswas, S., Babu, R.V.: Real time anomaly detection in H.264 compressed videos. In: Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, pp. 1–4. IEEE (2013)
Blackburn, J., Ribeiro, E.: Human motion recognition using isomap and dynamic time warping. In: Human Motion: Understanding, Modeling, Capture and Animation, pp. 285–298. Springer (2007)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: International Conference on Computer Vision, Beijing, pp. 1395–1402 (2005)
Bolles, R.C., Baker, H.H.: Epipolar-plane image analysis: a technique for analyzing motion sequences. In: 3th IEEE Workshop on Computer Vision, Representation, and Control, pp. 168–178. IEEE (1985)
Boughorbel, S., Tarel, J.P., Boujemaa, N.: Generalized histogram intersection kernel for image recognition. In: IEEE International Conference on Image Processing, vol. 3, pp. III–161. IEEE (2005)
Bourke, A., O’brien, J., Lyons, G.: Evaluation of a threshold-based tri-axial accelerometer fall detection algorithm. Gait Posture 26(2), 194–199 (2007)
Buch, N., Velastin, S., Orwell, J.: A review of computer vision techniques for the analysis of urban traffic. IEEE Trans. Intell. Transp. Syst. 12(3), 920–939 (2011)
Calderara, S., Heinemann, U., Prati, A., Cucchiara, R., Tishby, N.: Detecting anomalies in people’s trajectories using spectral graph analysis. Comput. Vis. Image Underst. 115(8), 1099–1111 (2011)
Candamo, J., Shreve, M., Goldgof, D., Sapper, D., Kasturi, R.: Understanding transit scenes: a survey on human behavior-recognition algorithms. IEEE Trans. Intell. Transp. Syst. 11(1), 206–224 (2010)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)
Chen, D.Y., Huang, P.C.: Motion-based unusual event detection in human crowds. J. Vis. Commun. Image Represent. 22(2), 178–186 (2011)
Cong, Y., Yuan, J., Liu, J.: Abnormal event detection in crowded scenes using sparse representation. Pattern Recognit. 46(7), 1851–1864 (2013)
Cui, L., Li, K., Chen, J., Li, Z.: Abnormal event detection in traffic video surveillance based on local features. In: 4th International Congress on Image and Signal Processing, vol. 1, pp. 362–366. IEEE (2011)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
De Rosa, R., Cesa-Bianchi, N., Gori, I., Cuzzolin, F.: Online action recognition via nonparametric incremental learning. In: British Machine Vision Conference. BMVA Press (2014)
Dee, H.M., Velastin, S.A.: How close are we to solving the problem of automated visual surveillance? Mach. Vis. Appl. 19(5–6), 329–343 (2007)
Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: Space-time pose representation for 3D human action recognition. In: International Conference on Image Analysis and Processing, pp. 456–464. Springer (2013)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE (2005)
Duda, R.O., Hart, P.E.: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 15(1), 11–15 (1972)
Fanello, S.R., Gori, I., Metta, G., Odone, F.: Keep it Simple and Sparse: Real-Time Action Recognition. Journal of Machine Learning Research 14(1), 2617–2640 (2013)
Farneback, G.: Two-frame motion estimation based on polynomial expansion. In: Image Analysis, pp. 363–370. Springer (2003)
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Fawzy, F., Abdelwahab, M., Mikhael, W.: 2DHOOF-2DPCA contour based optical flow algorithm for human activity recognition. In: IEEE 56th International Midwest Symposium on Circuits and Systems, pp. 1310–1313 (2013)
Feng, J., Zhang, C., Hao, P.: Online learning with self-organizing maps for anomaly detection in crowd scenes. In: 20th International Conference on Pattern Recognition, vol. 1, pp. 3599–3602 (2010)
Fortun, D., Bouthemy, P., Kervrann, C.: Optical flow modeling and computation: a survey. Comput. Vis. Image Underst. 134, 1–21 (2015)
Gong, S., Xiang, T.: Person re-identification. In: Visual Analysis of Behaviour, pp. 301–313. Springer (2011)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Guimarães, S., de A.-Araújo, A., Couprie, M., Leite, N.: An Approach to detect video transitions based on mathematical morphology. In: International Conference on Image Processing, vol. 3, pp. II–1021–4 (2003)
Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22(6), 2479–2494 (2013)
Horn, B.K., Schunck, B.G.: Determining optical flow. In: 1981 Technical Symposium East, pp. 319–331. International Society for Optics and Photonics (1981)
Hung, T.Y., Lu, J., Tan, Y.P.: Cross-scene abnormal event detection. In: IEEE International Symposium on Circuits and Systems, pp. 2844–2847 (2013)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D Convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Jiang, F., Yuan, J., Tsaftaris, S.A., Katsaggelos, A.K.: Anomalous video event detection using spatiotemporal context. Comput. Vis. Image Underst. 115(3), 323–333 (2011)
Jiang, X., Zhong, F., Peng, Q., Qin, X.: Online robust action recognition based on a hierarchical model. Vis. Comput. 30(9), 1021–1033 (2014)
Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: Tenth IEEE International Conference on Computer Vision, vol. 1, pp. 166–173. IEEE (2005)
Kliper-Gross, O., Hassner, T., Wolf, L.: The action similarity labeling challenge. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 615–621 (2012)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2014)
Liu, L., Shao, L.: Learning Discriminative representations from RGB-D video data. In: International Joint Conference on Artificial Intelligence, vol. 1, p. 3 (2013)
Lowe, D.: Object recognition from local scale-invariant features. In: Computer Vision, vol. 2, pp. 1150–1157 (1999)
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. Int. Jt. Conf. Artif. Intell. 81, 674–679 (1981)
Mahadevan, V., Li, W., Bhalodia, V., Vasconcelos, N.: Anomaly detection in crowded scenes. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1975–1981 (2010)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: 26th Annual International Conference on Machine Learning, pp. 689–696. ACM (2009)
McCahill, M., Norris, C.: CCTV systems in London: their structures and practices. Tech. rep., Centre for Criminology and Criminal Justice, University of Hull (2003)
Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 935–942 (2009)
Mitiche, A.: Rigid body kinematics: some basic notions. In: Computational Analysis of Visual Motion. Advances in Computer Vision and Machine Intelligence, pp. 31–43. Springer, US (1994)
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online Detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)
Moreira, T., Alcantara, M., Pedrini, H., Menotti, D.: Fast and accurate gesture recognition based on motion shapes. In: Iberoamerican Congress on Pattern Recognition, pp. 247–254. Springer (2015)
Nalwa, V.S.: A Guided Tour of Computer Vision. Addison-Wesley Longman Publishing Co. Inc., Boston (1993)
Nam, Y.: Crowd flux analysis and abnormal event detection in unstructured and structured scenes. Multimed. Tools Appl. 72(3), 3001–3029 (2014)
Ngo, C., Pong, T., Chin, R.: Detection of gradual transitions through temporal slice analysis. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1 (1999)
Ngo, C.W., Pong, T.C., Zhang, H.J.: Motion analysis and segmentation through spatio-temporal slices processing. IEEE Trans. Image Process. 12(3), 341–355 (2003)
Niebles, J.C., Fei-Fei, L.: A hierarchical model of shape and appearance for human action classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Nishida, N., Nakayama, H.: Multimodal gesture recognition using multi-stream recurrent neural network. In: Pacific-Rim Symposium on Image and Video Technology, pp. 682–694. Springer (2015)
Niyogi, S., Adelson, E.: Analyzing and recognizing walking figures in XYT. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 469–474 (1994)
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Ozturk, O., Yamasaki, T., Aizawa, K.: Detecting dominant motion flows in unstructured/structured crowd scenes. In: 20th International Conference on Pattern Recognition, pp. 3533–3536 (2010)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Piciarelli, C., Foresti, G.: On-line trajectory clustering for anomalous events detection. Pattern Recognit. Lett. 27(15), 1835–1842 (2006)
Raja, K., Laptev, I., Pérez, P., Oisel, L.: Joint pose estimation and action recognition in image graphs. In: 18th IEEE International Conference on Image Processing, pp. 25–28. IEEE (2011)
Ran, Y.: Symmetry in Human motion analysis: theory and experiment. Ph.D. thesis, University of Maryland (2006)
Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)
da S. Pinto, A., Pedrini, H., Schwartz, W., Rocha, A.: Video-based face spoofing detection through visual rhythm analysis. In: 25th SIBGRAPI Conference on Graphics, Patterns and Images, pp. 221–228 (2012)
Saligrama, V.: Video anomaly detection based on local statistical aggregates. In: IEEE Conference on Computer Vision and Pattern Recognition pp. 2112–2119 (2012)
Schindler, K., van Gool, L.: Action snippets: how many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J., Platt, J.C.: Support vector method for novelty detection. NIPS 12, 582–588 (1999)
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 17th International Conference on Pattern Recognition, vol. 3, pp. 32–36. IEEE (2004)
Prison Service Order.: Display screen equipment health and safety issues. H.M. Prison Service (2000)
Sobral, A., Vacavant, A.: A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Comput. Vis. Image Underst. 122, 4–21 (2014)
Sun, X., Chen, M., Hauptmann, A.: Action Recognition via local descriptors and holistic features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 58–65. IEEE (2009)
Suriani, N.S., Hussain, A., Zulkifley, M.A.: Sudden event recognition: a survey. Sensors 13(8), 9966–9998 (2013)
Tang, X., Zhang, S., Yao, H.: Sparse Coding based motion attention for abnormal event detection. In: 20th IEEE International Conference on Image Processing, pp. 3602–3606 (2013)
Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)
Thida, M., Eng, H.L., Remagnino, P.: Laplacian eigenmap with temporal constraints for local abnormality detection in crowded scenes. IEEE Trans. Cybern. 43(6), 2147–2156 (2013)
Tung, P.T., Ngoc, L.Q.: Elliptical density shape model for hand gesture recognition. In: Fifth Symposium on Information and Communication Technology, pp. 186–191. ACM (2014)
UMN—Detection of Unusual Crowd Dataset (2015) http://mha.cs.umn.edu/
Valio, F.B., Pedrini, H., Leite, N.J.: Fast rotation-invariant video caption detection based on visual rhythm. In: San Martin, C., Kim, S.W. (eds.) Progress in pattern recognition, image analysis, computer vision, and applications, Lecture notes in computer science, pp. 157–164. Springer, Berlin, Heidelberg (2011)
Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. Vis. Comput. 29(10), 983–1009 (2013)
Vo, D.H., Huynh, H.H., Meaunier, J.: Geometry-based dynamic hand gesture recognition. J. Sci. Technol. 1, 13–19 (2015)
Wallace, E., Diffley, C., Britain, G.: CCTV: Making it work: CCTV control room ergonomics. Publication (Great Britain. Home Office. Police Scientific Development Branch). Police Scientific Development Branch (1998)
van der Walt, S., Colbert, S.C., Varoquaux, G.: The numpy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2), 22–30 (2011)
van der Walt, S., Schönberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D., Yager, N., Gouillart, E., Yu, T.: The scikit-image contributors: scikit-image: image processing in Python. PeerJ 2, e453 (2014)
Wang, S., Huang, K., Tan, T.: A Compact optical flow based motion representation for real-time action recognition in surveillance scenes. In: 16th IEEE International Conference on Image Processing, pp. 1121–1124 (2009)
Wang, T., Chen, J., Snoussi, H.: Online detection of abnormal events in video streams. J. Electr. Comput. Eng. 2013, 1–12 (2013)
Wong, S.F., Cipolla, R.: Extracting spatiotemporal interest points using global information. In: IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)
Yang, W., Wang, Y., Mori, G.: Human action recognition from a single clip per action. In: IEEE 12th International Conference on Computer Vision Workshops, pp. 482–489 (2009)
Yu, M., Liu, L., Shao, L.: Structure-preserving binary representations for RGB-D action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1651–1664 (2016)
Zack, G., Rogers, W., Latt, S.: Automatic measurement of sister chromatid exchange frequency. J. Histochem. Cytochem. 25(7), 741–753 (1977)
Zhang, Y., Qin, L., Yao, H., Huang, Q.: Abnormal crowd behavior detection based on social attribute-aware force model. In: 19th IEEE International Conference on Image Processing, pp. 2689–2692 (2012)
Zhao, B., Fei-Fei, L., Xing, E.P.: Online detection of unusual events in videos via dynamic sparse coding. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3313–3320. IEEE Computer Society (2011)
Acknowledgments
The authors are grateful to FAPESP, CNPq and CAPES for the financial support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Torres, B.S., Pedrini, H. Detection of complex video events through visual rhythm. Vis Comput 34, 145–165 (2018). https://doi.org/10.1007/s00371-016-1321-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-016-1321-1