Residual Gating Fusion Network for Human Action Recognition

Zhang, Junxuan; Hu, Haifeng

doi:10.1007/978-3-319-97909-0_9

Residual Gating Fusion Network for Human Action Recognition

Junxuan Zhang²¹ &
Haifeng Hu²¹

Conference paper
First Online: 09 August 2018

3111 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10996))

Abstract

Most of the recent works leverage Two-Stream framework to model the spatiotemporal information for video action recognition and achieve remarkable performance. In this paper, we propose a novel convolution architecture, called Residual Gating Fusion Network (RGFN), to improve their performance by fully exploring spatiotemporal information in residual signals. In order to further exploit the local details of low-level layers, we introduce Multi-Scale Convolution Fusion (MSCF) to implement spatiotemporal fusion at multiple levels. Since RGFN is an end-to-end network, it can be trained on various kinds of video datasets and applicative to other video analysis tasks. We evaluate our RGFN on two standard benchmarks, i.e., UCF101 and HMDB51, and analyze the designs of convolution network. Experiments results demonstrate the advantages of RGFN, achieving the state-of-the-art performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Laptev, I.: On space-time interest points. In: ICCV, vol. 1, pp. 432–439 (2003)
Google Scholar
Wang, H.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Google Scholar
Feichtenhofer, C.: Convolutional two-stream network fusion for video action recognition. In: CVPR, pp. 1933–1941 (2016)
Google Scholar
Wang, L.: Temporal segment networks: towards good practices for deep action recognition. ACM Trans. Inf. Syst. 22(1), 20–36 (2016)
Google Scholar
He, K.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Soomro, K.: UCF101: a dataset of 101 human actions classes from videos in the wild, CRCV-TR-12-01 (2012)
Google Scholar
Kuehne, H.: HMDB: a large video database for human motion recognition. In: ICCV (2011)
Google Scholar
Bilen, H.: Dynamic image networks for action recognition. In: CVPR, pp. 3034–3042 (2016)
Google Scholar
Wang, L.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR, pp. 4305–4314 (2015)
Google Scholar
Du, T.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2016)
Google Scholar
Varol, G.: Long-term temporal convolutions for action recognition. TPAMI, PP(99), 1 (2016)
Google Scholar
Zhu, W.: A key volume mining deep framework for action recognition. In: CVPR, pp. 1991–1999 (2016)
Google Scholar
Diba, A.: Deep temporal linear encoding networks. In: CVPR (2017)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grant 61673402, Grant 61273270, and Grant 60802069, in part by the Natural Science Foundation of Guangdong under Grant 2017A030311029, Grant 2016B010109002, Grant 2015B090912001, Grant 2016B010123005, and Grant 2017B090909005, in part by the Science and Technology Program of Guangzhou under Grant 201704020180 and Grant 201604020024, and in part by the Fundamental Research Funds for the Central Universities of China.

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Sun Yat-sen University, Guangzhou, China
Junxuan Zhang & Haifeng Hu

Authors

Junxuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haifeng Hu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Zhou
Beihang University, Beijing, China
Yunhong Wang
Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Xinjiang University, Urumqi, China
Zhenhong Jia
Tsinghua University, Beijing, China
Jianjiang Feng
Chinese Academy of Sciences, Beijing, China
Shiguang Shan
Xinjiang University, Urumqi, China
Kurban Ubul
Tsinghua University, Shenzhen, China
Zhenhua Guo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Hu, H. (2018). Residual Gating Fusion Network for Human Action Recognition. In: Zhou, J., et al. Biometric Recognition. CCBR 2018. Lecture Notes in Computer Science(), vol 10996. Springer, Cham. https://doi.org/10.1007/978-3-319-97909-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-97909-0_9
Published: 09 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97908-3
Online ISBN: 978-3-319-97909-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics