Abstract
This paper presents a kernel-based relevance analysis for video data to support social behavior recognition. Our approach, termed KRAV, is twofold: (i) A feature ranking based on centered kernel alignment (CKA) is carried out to match social semantic features with the output labels (individual and group behaviors). The employed method is an extension of the conventional CKA to mitigate the imbalance effect of unusual human behaviors. (ii) A classification stage to perform the behavior prediction. For concrete testing, the Israel Institute of Technology social behavior database is employed to assess the KRAV under a tenfold cross-validation scheme. Attained results show that the proposed approach for the individual recognition task obtains 0.5925 \(F_1\) measure using 50 relevant features. Likewise, for the group recognition task obtains 0.8094 \(F_1\) measure using 12 relevant features, which in both cases outperforms state-of-the-art results concerning the classification performance and number of employed features. Also, our video-based approach would assist further social behavior analysis from the set of features selected regarding the recognition of individual profiles and group behaviors.
Similar content being viewed by others
Notes
Performed by the INESC TEC, Group of Portugal and supervised by the lab of social-psychology of the University of Porto—http://sigarra.up.pt/fpceup.
References
Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D.: Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 555–560 (2008)
Ai, S., Lu, T., Xiong, Y.: Improved dense trajectories for action recognition based on random projection and fisher vectors. In: Proceedings of the MIPPR, vol. 10609, pp. 337–344. SPIE (2017)
Álvarez-Meza, A., Cárdenas-Peña, D., Castellanos-Domínguez, G.: Unsupervised Kernel Function Building Using Maximization of Information Potential Variability, pp. 335–342. Springer, Berlin (2014)
Álvarez-Meza, A., Orozco-Gutierrez, A., Castellanos-Dominguez, G.: Kernel-based relevance analysis with enhanced interpretability for detection of brain activity patterns. Front. Neurosci. 11, 550 (2017)
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential Deep Learning for Human Action Recognition. LNCS, vol. 7065, pp. 29–39. Springer, Berlin (2011)
Bloom, V., Argyriou, V., Makris, D.: Linear latent low dimensional space for online early action recognition and prediction. Pattern Recogn. 72, 532–547 (2017)
Brockmeier, A., Choi, J., Kriminger, E., Francis, J., Principe, J.: Neural decoding with kernel-based metric learning. Neural Comput. 26(6), 1080–1107 (2014)
Chu, C., Ni, Y., Tan, G., Saunders, C.J., Ashburner, J.: Kernel regression for fmri pattern prediction. NeuroImage 56(2), 662–673 (2011)
Climent-Pérez, P., Chaaraoui, A., Padilla-López, J., Flórez-Revuelta, F.: Optimal Joint Selection for Skeletal Data from RGB-D Devices Using a Genetic Algorithm, pp. 163–174. Springer, Berlin (2013)
Dawn, D.D., Shaikh, S.H.: A comprehensive survey of human action recognition with spatio-temporal interest point (stip) detector. Vis. Comput. 32(3), 289–306 (2016)
Daza-Santacoloma, G., Arias-Londoño, J., Godino-Llorente, J., Sáenz-Lechón, N., Osma-Ruiz, V., Castellanos-Domínguez, G.: Dynamic feature extraction: an application to voice pathology detection. Intell. Autom. Soft Comput. 15(4), 667–682 (2009)
Fan, W., Bouguila, N., Liu, X.: A nonparametric bayesian learning model using accelerated variational inference and feature selection. Pattern Anal. Appl. 22(1), 63–74 (2019)
Gretton, A., Bousquet, O., Smola, A., Scolkopf, B.: Measuring Statistical Dependence with Hilbert–Schmidt Norms, pp. 63–77. Springer, Berlin (2005)
Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22(6), 2479–2494 (2013)
Guo, Y., Tao, D., Liu, W., Cheng, J.: Multiview cauchy estimator feature embedding for depth and inertial sensor-based human action recognition. IEEE Trans. Syst. Man Cybern. Syst. 47(4), 617–627 (2017)
Guyon, I., Elisseeff, A.: Special issue on variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Harandi, M., Salzmann, M., Hartley, R.: Dimensionality reduction on spd manifolds: the emergence of geometry-aware methods. IEEE Trans. Pattern Anal. Mach. Intell. 40(1), 48–62 (2018)
Iosifidis, A., Tefas, A., Pitas, I.: Distance-based human action recognition using optimized class representations. Neurocomputing 161, 47–55 (2015)
Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Karim, S., Zhang, Y., Laghari, A.A., Asif, M.R.: Image Processing Based Proposed Drone for Detecting and Controlling Street Crimes, pp. 1725–1730. IEEE, Piscataway (2018)
Lee, J.A., Renard, E., Bernard, G., Dupont, P., Verleysen, M.: Type 1 and 2 mixtures of Kullback–Leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomputing 112, 92–108 (2013)
Li, Y., Ye, J., Wang, T., Huang, S.: Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition. Vis. Comput. 31(10), 1383–1394 (2015)
Mabrouk, A.B., Zagrouba, E.: Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst. Appl. 91, 480–491 (2018)
Molina-Giraldo, S., Carvajal-González, J., Álvarez-Meza, A., Castellanos-Domínguez, G.: Video segmentation framework based on multi-kernel representations and feature relevance analysis for object classification. Adv. Intell. Syst. Comput. 318, 273–283 (2015)
Negin, F., Özdemir, F., Akgül, C., Yüksel, K., Erçil, A.: A Decision Forest Based Feature Selection Framework for Action Recognition from RGB-Depth Cameras, pp. 648–657. Springer, Berlin (2013)
Nie, F., Xu, D., Tsang, I.W.H., Zhang, C.: Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction. IEEE Trans. Image Process. 19(7), 1921–1932 (2010)
Pei, L., Ye, M., Zhao, X., Dou, Y., Bao, J.: Action recognition by learning temporal slowness invariant features. Vis. Comput. 32(11), 1395–1404 (2016)
Pereira, E.M., Ciobanu, L., Cardoso, J.S.: Cross-layer classification framework for automatic social behavioural analysis in surveillance scenario. Neural Comput. Appl. 28(9), 2425–2444 (2017)
Ribeiro, P.C., Santos-Victor, J., Lisboa, P.: Human activity recognition from video: modeling, feature selection and classification architecture. In: Proceedings of HAREM, pp. 61–78. BMVC (2005)
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1–2), 23–69 (2003)
Soviany, S., Sandulescu, V., Puscoci, S.: The Hierarchical Classification Model Using Support Vector Machine with Multiple Kernels in Human Behavioral Pattern Recognition, pp. 683–686. IEEE, Piscataway (2017)
Venna, J., Peltonen, J., Nybo, K., Aidos, H., Kaski, S.: Information retrieval perspective to nonlinear dimensionality reduction for data visualization. J. Mach. Learn. Res. 11(Feb), 451–490 (2010)
Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. Vis. Comput. 29(10), 983–1009 (2013)
Wang, H., Oneata, D., Verbeek, J., Schmid, C.: A robust and efficient video representation for action recognition. Int. J. Comput. Vision 119(3), 219–238 (2016)
Weng, J., Liu, M., Jiang, X., Yuan, J.: Deformable pose traversal convolution for 3d action and gesture recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 136–152 (2018)
Weng, J., Weng, C., Yuan, J., Liu, Z.: Discriminative spatio-temporal pattern discovery for 3d action recognition. IEEE Trans. Circuits Syst. Video Technol. 29(4), 1077–1089 (2019)
Xiao, Y., Xia, L.: Human action recognition using modified slow feature analysis and multiple kernel learning. Multimed. Tools Appl. 75(21), 13041–13056 (2016)
Xing, E., Ng, A., Jordan, M., Russell, S.: Distance metric learning, with application to clustering with side-information. Advances in Neural Information Processing Systems, pp. 521–528 (2003)
Zhao, F., Huang, Y., Wang, L., Xiang, T., Tan, T.: Learning relevance restricted boltzmann machine for unstructured group activity and event understanding. Int. J. Comput. Vis. 119(3), 329–345 (2016)
Zhao, S., Liu, Y., Han, Y., Hong, R., Hu, Q., Tian, Q.: Pooling the convolutional layers in deep convnets for video action recognition. IEEE Trans. Circuits Syst. Video Technol. 28(8), 1839–1849 (2018)
Acknowledgements
Under grants provided by the Colciencias’s project: “Caracterización morfológica de estructuras cerebrales por técnicas de imagenpara el tratamiento mediante implantación quirúrgica de neuroestimuladores en la enfermedad e Parkinson”, CV 110180763808—Colciencias. J. Fernández-Ramírez is partially founded by the project: “Sistema de clasificación de videos basado en técnicas de representación utilizando métodos núcleo e inferencia bayesiana”, CV E6-19-2, from the VIIE-UTP, and by the Colciencias program: Jóvenes investigadores e innovadores-Convocatoria 812 de 2018.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fernández-Ramírez, J., Álvarez-Meza, A., Pereira, E.M. et al. Video-based social behavior recognition based on kernel relevance analysis. Vis Comput 36, 1535–1547 (2020). https://doi.org/10.1007/s00371-019-01754-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-019-01754-y