Global and Local Spatial-Attention Network for Isolated Gesture Recognition

Yuan, Qi; Wan, Jun; Lin, Chi; Li, Yunan; Miao, Qiguang; Li, Stan Z.; Wang, Lihua; Lu, Yunxiang

doi:10.1007/978-3-030-31456-9_10

Qi Yuan¹³,
Jun Wan¹⁴,
Chi Lin¹⁵,
Yunan Li¹⁶,
Qiguang Miao¹⁶,
Stan Z. Li¹⁴,
Lihua Wang¹³ &
…
Yunxiang Lu¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11818))

Included in the following conference series:

Chinese Conference on Biometric Recognition

1738 Accesses
2 Citations

Abstract

In this paper, we focus on isolated gesture recognition from RGB-D videos. Our main idea is to design an algorithm that can extract global and local information from multi-modality inputs. To this end, we propose a novel attention-based method with 3D convolutional neural network (CNN) to recognize isolated gesture recognition. It includes two parts. The first one is a global and local spatial-attention network (GLSANet), which takes into account the global information that focuses on the context of the frame and the local information that focuses on the hand/arm actions of the person, to extract efficient features from multi-modality inputs simultaneously. The second part is an adaptive model fusion strategy to fuse the predicted probabilities from multi-modality inputs. Experiments demonstrate that the proposed method has achieved state-of-the-art performance on the IsoGD dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)
Article Google Scholar
Wan, J., Zhao, Y., Zhou, S., Guyon, I., Escalera, S., Li, S.Z.: ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: CVPRW, pp. 56–64 (2016)
Google Scholar
Miao, Q., et al.: Multimodal gesture recognition based on the ResC3D network. In: ICCVW, pp. 3047–3055 (2017)
Google Scholar
Duan, J., Wan, J., Zhou, S., Guo, X., Li, S.: A unified framework for multi-modal isolated gesture recognition. TOMM 9(4) (2017)
Google Scholar
Li, Y., et al.: Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model. In: ICPR, pp. 25–30. IEEE (2016)
Google Scholar
Li, Y., et al.: Large-scale gesture recognition with a fusion of RGB-D data based on optical flow and the C3D model PRL (2017)
Google Scholar
Li, Y., et al.: Large-scale gesture recognition with a fusion of RGB-D data based on saliency theory and C3D model. TCSVT 28, 2956–2964 (2017)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV, pp. 2556–2563. IEEE (2011)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild arXiv preprint arXiv:1212.0402 (2012)
Kay, W., et al.: The kinetics human action video dataset arXiv preprint arXiv:1705.06950 (2017)
Wang, P., Li, W., Liu, S., Gao, Z., Tang, C., Ogunbona, P.: Large-scale isolated gesture recognition using convolutional neural networks. In: ICPR, pp. 7–12. IEEE (2016)
Google Scholar
Fernando, B., Gavves, E., Oramas, J., Ghodrati, A., Tuytelaars, T.: Rank pooling for action recognition. TPAMI 39(4), 773–787 (2017)
Article Google Scholar
Chai, X., Liu, Z., Yin, F., Liu, Z., Chen, X.: Two streams recurrent neural networks for large-scale continuous gesture recognition. In: ICPR, pp. 31–36. IEEE (2016)
Google Scholar
Kopuklu, O., Kose, N., Rigoll, G.: Motion fused frames: data level fusion strategy for hand gesture recognition. In: CVPR, pp. 2103–2111 (2018)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Zhu, G., Zhang, L., Mei, L., Shao, J., Song, J., Shen, P.: Large-scale isolated gesture recognition using pyramidal 3D convolutional networks. In: ICPR, pp. 19–24. IEEE (2016)
Google Scholar
Tran, D., Ray, J., Shou, Z., Chang, S.F., Paluri, M.: ConvNet architecture search for spatiotemporal feature learning arXiv preprint arXiv:1708.05038 (2017)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)
Google Scholar
Lin, C., Wan, J., Liang, Y., Li, S.Z.: Large-scale isolated gesture recognition using masked Res-C3D network and skeleton LSTM. In: FG (2018)
Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Google Scholar
Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017)
Article Google Scholar
Wang, H., Wang, P., Song, Z., Li, W.: Large-scale multimodal gesture recognition using heterogeneous networks. In: ICCVW, pp. 3129–3137 (2017)
Google Scholar
Zhang, L., Zhu, G., Shen, P., Song, J., Shah, S.A., Bennamoun, M.: Learning spatiotemporal features using 3DCNN and convolutional LSTM for gesture recognition. In: ICCV, pp. 3120–3128 (2017)
Google Scholar

Download references

Acknowledgments

This work has been partially supported by the Chinese National Natural Science Foundation Projects \(\#\)61876179, \(\#\)61872367, and by Science and Technology Development Fund of Macau (Grant No. 0025/2018/A1). We acknowledge the support of NVIDIA Corporation with the donation of the GPU used for this research.

Author information

Authors and Affiliations

School of Software, Beihang University, Beijing, China
Qi Yuan, Lihua Wang & Yunxiang Lu
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jun Wan & Stan Z. Li
University of Southern California, Los Angeles, CA, USA
Chi Lin
School of Computer Science and Technology, Xidian University & Xi’an Key Laboratory of Big Data and Intelligent Vision, Xi’an, China
Yunan Li & Qiguang Miao

Authors

Qi Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wan
View author publications
You can also search for this author in PubMed Google Scholar
Chi Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yunan Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiguang Miao
View author publications
You can also search for this author in PubMed Google Scholar
Stan Z. Li
View author publications
You can also search for this author in PubMed Google Scholar
Lihua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yunxiang Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Wan .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Chinese Academy of Sciences, Beijing, China
Ran He
Tsinghua University, Beijing, China
Jianjiang Feng
Chinese Academy of Sciences, Beijing, China
Shiguang Shan
Tsinghua University, Shenzhen, China
Zhenhua Guo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, Q. et al. (2019). Global and Local Spatial-Attention Network for Isolated Gesture Recognition. In: Sun, Z., He, R., Feng, J., Shan, S., Guo, Z. (eds) Biometric Recognition. CCBR 2019. Lecture Notes in Computer Science(), vol 11818. Springer, Cham. https://doi.org/10.1007/978-3-030-31456-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-31456-9_10
Published: 05 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31455-2
Online ISBN: 978-3-030-31456-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics