Abstract
Human action recognition is an area with increasing significance and has attracted much research attention over these years. Fusing multiple features is intuitively an appropriate way to better recognize actions in videos, as single type of features is not able to capture the visual characteristics sufficiently. However, most of the existing fusion methods used for action recognition fail to measure the contributions of different features and may not guarantee the performance improvement over the individual features. In this paper, we propose a new Hierarchical Bayesian Multiple Kernel Learning (HB-MKL) model to effectively fuse diverse types of features for action recognition. The model is able to adaptively evaluate the optimal weights of the base kernels constructed from different features to form a composite kernel. We evaluate the effectiveness of our method with the complementary features capturing both appearance and motion information from the videos on challenging human action datasets, and the experimental results demonstrate the potential of HB-MKL for action recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: ECCV, pp. 428–441 (2006)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: ACM MM, pp. 357–360 (2007)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC (2008)
Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: ECCV, pp. 650–663 (2008)
Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR, pp. 2004–2011 (2009)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011)
Tian, Y., Cao, L., Liu, Z., Zhang, Z.: Hierarchical filtered motion for action recognition in crowded videos. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 42, 313–323 (2012)
Ullah, M.M., Parizi, S.N., Laptev, I.: Improving bag-of-features action recognition with non-local cues. In: BMVC, pp. 95.1–95.11 (2010)
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73, 213–238 (2007)
Girolami, M., Rogers, S.: Hierarchic Bayesian models for kernel learning. In: ICML, pp. 241–248 (2005)
Damoulas, T., Girolami, M.A.: Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection. Bioinformatics 24, 1264–1270 (2008)
Gönen, M.: Bayesian efficient multiple kernel learning. In: ICML, pp. 1–8 (2012)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR, pp. 1–8 (2007)
Beal, M.J.: Variational Algorithms for Approximate Bayesian Inference. University of London, London (2003)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, vol. 3, pp. 32–36 (2004)
Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, pp. 1–8 (2008)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV, pp. 2556–2563 (2011)
Sheng, B., Yang, W., Sun, C.: Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158, 73–80 (2015)
Zhang, H., Zhou, W., Reardon, C., Parker, L.E.: Simplex-based 3d spatio-temporal feature description for action recognition. In: CVPR, pp. 2067–2074 (2014)
Wu, J., Zhang, Y., Lin, W.: Towards good practices for action video encoding. In: CVPR, pp. 2577–2584 (2014)
Sun, L., Jia, K., Chan, T., Fang, Y., Wang, G., Yan, S.: Dl-sfa: deeply-learned slow feature analysis for action recognition. In: CVPR, pp. 2625–2632 (2014)
Yang, X., Tian, Y.L.: Action recognition using super sparse coding vector with spatio-temporal awareness. In: ECCV, pp. 727–741 (2014)
Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: ICCV, pp. 4041–4049 (2015)
Lan, T., Zhu, Y., Zamir, A.R., Savarese, S.: Action recognition by hierarchical mid-level action elements. In: ICCV, pp. 4552–4560 (2015)
Shao, L., Liu, L., Yu, M.: Kernelized multiview projection for robust action recognition. IJCV 1–15 (2015)
Wang, D., Shao, Q., Li, X.: A new unsupervised model of action recognition. In: ICIP, pp. 1160–1164 (2015)
Liu, A.A., Su, Y.T., Nie, W.Z., Kankanhalli, M.: Hierarchical clustering multi-task learning for joint human action grouping and recognition. T-PAMI, 1–14 (2016)
Liu, L., Shao, L., Li, X., Lu, K.: Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans. Cybern. 46, 158–170 (2016)
Acknowledgments
This work is partly supported by the 973 basic research program of China (Grant No. 2014CB349303), the Natural Science Foundation of China (Grant No. 61472421, U1636218, 61472420, 61370185, 61170193, 61472063), the Strategic Priority Research Program of the CAS (Grant No. XDB02070003), the Natural Science Foundation of Guangdong Province (Grant No. S2013010013432, S2013010015940), and the CAS External cooperation key project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sun, W., Yuan, C., Wang, P., Yang, S., Hu, W., Cai, Z. (2017). Hierarchical Bayesian Multiple Kernel Learning Based Feature Fusion for Action Recognition. In: Schwenker, F., Scherer, S. (eds) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction. MPRSS 2016. Lecture Notes in Computer Science(), vol 10183. Springer, Cham. https://doi.org/10.1007/978-3-319-59259-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-59259-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59258-9
Online ISBN: 978-3-319-59259-6
eBook Packages: Computer ScienceComputer Science (R0)