Hierarchical Bayesian Multiple Kernel Learning Based Feature Fusion for Action Recognition

Sun, Wen; Yuan, Chunfeng; Wang, Pei; Yang, Shuang; Hu, Weiming; Cai, Zhaoquan

doi:10.1007/978-3-319-59259-6_8

Wen Sun¹⁵,
Chunfeng Yuan¹⁵,
Pei Wang¹⁵,
Shuang Yang¹⁵,
Weiming Hu¹⁵ &
…
Zhaoquan Cai¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10183))

Included in the following conference series:

IAPR Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer Interaction

881 Accesses

Abstract

Human action recognition is an area with increasing significance and has attracted much research attention over these years. Fusing multiple features is intuitively an appropriate way to better recognize actions in videos, as single type of features is not able to capture the visual characteristics sufficiently. However, most of the existing fusion methods used for action recognition fail to measure the contributions of different features and may not guarantee the performance improvement over the individual features. In this paper, we propose a new Hierarchical Bayesian Multiple Kernel Learning (HB-MKL) model to effectively fuse diverse types of features for action recognition. The model is able to adaptively evaluate the optimal weights of the base kernels constructed from different features to form a composite kernel. We evaluate the effectiveness of our method with the complementary features capturing both appearance and motion information from the videos on challenging human action datasets, and the experimental results demonstrate the potential of HB-MKL for action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: ECCV, pp. 428–441 (2006)
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: ACM MM, pp. 357–360 (2007)
Google Scholar
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC (2008)
Google Scholar
Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: ECCV, pp. 650–663 (2008)
Google Scholar
Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR, pp. 2004–2011 (2009)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011)
Google Scholar
Tian, Y., Cao, L., Liu, Z., Zhang, Z.: Hierarchical filtered motion for action recognition in crowded videos. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 42, 313–323 (2012)
Google Scholar
Ullah, M.M., Parizi, S.N., Laptev, I.: Improving bag-of-features action recognition with non-local cues. In: BMVC, pp. 95.1–95.11 (2010)
Google Scholar
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73, 213–238 (2007)
Article Google Scholar
Girolami, M., Rogers, S.: Hierarchic Bayesian models for kernel learning. In: ICML, pp. 241–248 (2005)
Google Scholar
Damoulas, T., Girolami, M.A.: Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection. Bioinformatics 24, 1264–1270 (2008)
Article Google Scholar
Gönen, M.: Bayesian efficient multiple kernel learning. In: ICML, pp. 1–8 (2012)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)
Google Scholar
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR, pp. 1–8 (2007)
Google Scholar
Beal, M.J.: Variational Algorithms for Approximate Bayesian Inference. University of London, London (2003)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, vol. 3, pp. 32–36 (2004)
Google Scholar
Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, pp. 1–8 (2008)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV, pp. 2556–2563 (2011)
Google Scholar
Sheng, B., Yang, W., Sun, C.: Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158, 73–80 (2015)
Article Google Scholar
Zhang, H., Zhou, W., Reardon, C., Parker, L.E.: Simplex-based 3d spatio-temporal feature description for action recognition. In: CVPR, pp. 2067–2074 (2014)
Google Scholar
Wu, J., Zhang, Y., Lin, W.: Towards good practices for action video encoding. In: CVPR, pp. 2577–2584 (2014)
Google Scholar
Sun, L., Jia, K., Chan, T., Fang, Y., Wang, G., Yan, S.: Dl-sfa: deeply-learned slow feature analysis for action recognition. In: CVPR, pp. 2625–2632 (2014)
Google Scholar
Yang, X., Tian, Y.L.: Action recognition using super sparse coding vector with spatio-temporal awareness. In: ECCV, pp. 727–741 (2014)
Google Scholar
Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: ICCV, pp. 4041–4049 (2015)
Google Scholar
Lan, T., Zhu, Y., Zamir, A.R., Savarese, S.: Action recognition by hierarchical mid-level action elements. In: ICCV, pp. 4552–4560 (2015)
Google Scholar
Shao, L., Liu, L., Yu, M.: Kernelized multiview projection for robust action recognition. IJCV 1–15 (2015)
Google Scholar
Wang, D., Shao, Q., Li, X.: A new unsupervised model of action recognition. In: ICIP, pp. 1160–1164 (2015)
Google Scholar
Liu, A.A., Su, Y.T., Nie, W.Z., Kankanhalli, M.: Hierarchical clustering multi-task learning for joint human action grouping and recognition. T-PAMI, 1–14 (2016)
Google Scholar
Liu, L., Shao, L., Li, X., Lu, K.: Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans. Cybern. 46, 158–170 (2016)
Article Google Scholar

Download references

Acknowledgments

This work is partly supported by the 973 basic research program of China (Grant No. 2014CB349303), the Natural Science Foundation of China (Grant No. 61472421, U1636218, 61472420, 61370185, 61170193, 61472063), the Strategic Priority Research Program of the CAS (Grant No. XDB02070003), the Natural Science Foundation of Guangdong Province (Grant No. S2013010013432, S2013010015940), and the CAS External cooperation key project.

Author information

Authors and Affiliations

CAS Center for Excellence in Brain Science and Intelligence Technology, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Wen Sun, Chunfeng Yuan, Pei Wang, Shuang Yang & Weiming Hu
Huizhou University, Huizhou, Guangdong, China
Zhaoquan Cai

Authors

Wen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Chunfeng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Pei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Weiming Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoquan Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunfeng Yuan .

Editor information

Editors and Affiliations

Universität Ulm, Ulm, Germany
Friedhelm Schwenker
Multimodal Communication and Computation, University of Southern California, Playa Vista, California, USA
Stefan Scherer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, W., Yuan, C., Wang, P., Yang, S., Hu, W., Cai, Z. (2017). Hierarchical Bayesian Multiple Kernel Learning Based Feature Fusion for Action Recognition. In: Schwenker, F., Scherer, S. (eds) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction. MPRSS 2016. Lecture Notes in Computer Science(), vol 10183. Springer, Cham. https://doi.org/10.1007/978-3-319-59259-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-59259-6_8
Published: 01 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59258-9
Online ISBN: 978-3-319-59259-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics