Boxless Action Recognition in Still Images via Recurrent Visual Attention

Feng, Weijiang; Zhang, Xiang; Huang, Xuhui; Luo, Zhigang

doi:10.1007/978-3-319-70096-0_68

Weijiang Feng¹⁸,
Xiang Zhang¹⁸,
Xuhui Huang¹⁸ &
…
Zhigang Luo¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10635))

Included in the following conference series:

International Conference on Neural Information Processing

7861 Accesses
1 Citations

Abstract

Boxless action recognition in still images means recognizing human actions in the absence of ground-truth bounding boxes. Since no ground-truth bounding boxes are provided, boxless action recognition is more challenging than traditional action recognition tasks. Towards this end, AttSPP-net jointly integrates soft attention and spatial pyramid pooling into a convolutional neural network, and achieves comparable recognition accuracies even with some bounding box based approaches. However, the soft attention of AttSPP-net concentrates on only one fixation, rather than combining information from different fixations over time, which is the mechanism of human visual attention. In this paper, we take inspiration from this mechanism and propose a ReAttSPP-net for boxless action recognition. ReAttSPP-net utilizes a recurrent neural network model of visual attention in order to extract information from a sequence of fixations. Experiments on three public action recognition benchmark datasets including PASCAL VOC 2012, Willow and Sports demonstrate that ReAttSPP-net can achieve promising results and obtains higher recognition performance than AttSPP-net.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: British Machine Vision Conference (2010)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Feng, W., Zhang, X., Huang, X., Luo, Z.: Attention focused spatial pyramid pooling for boxless action recognition in still images. In: International Conference on Artificial Neural Networks. Springer (2017)
Google Scholar
Gkioxari, G., Girshick, R., Malik, J.: Actions and attributes from wholes and parts. In: IEEE International Conference on Computer Vision, pp. 2470–2478 (2015)
Google Scholar
Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with R*CNN. In: IEEE International Conference on Computer Vision, pp. 1080–1088 (2015)
Google Scholar
Guan, N., Tao, D., Luo, Z., Yuan, B.: NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans. Sig. Process. 60(6), 2882–2898 (2012)
Article MathSciNet Google Scholar
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Patt. Anal. Mach. Intell. 31(10), 1775–1789 (2009)
Article Google Scholar
Hoai, M.: Regularized max pooling for image categorization. J. Brit. Inst. Radio Eng. 14(3), 94–100 (2014)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)
Google Scholar
Khan, F.S., Anwer, R.M., van de Weijer, J., Bagdanov, A.D., Lopez, A.M., Felsberg, M.: Coloring action recognition in still images. Int. J. Comput. Vis. 105(3), 205–221 (2013)
Article Google Scholar
Khan, F.S., van de Weijer, J., Anwer, R.M., Felsberg, M., Gatta, C.: Semantic pyramids for gender and action recognition. IEEE Trans. Image Process. 23(8), 3633–3645 (2014)
Article MathSciNet Google Scholar
Liu, T., Tao, D.: Classification with noisy labels by importance reweighting. IEEE Trans. Patt. Anal. Mach. Intell. 38(3), 447–461 (2016)
Article MathSciNet Google Scholar
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)
Google Scholar
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)
Google Scholar
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)
Article Google Scholar
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Patt. Anal. Mach. Intell. 34(3), 601–614 (2012)
Article Google Scholar
Sharma, G., Jurie, F., Schmid, C.: Discriminative spatial saliency for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3506–3513 (2012)
Google Scholar
Sharma, G., Jurie, F., Schmid, C.: Expanded parts model for human attribute and action recognition in still images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–659 (2013)
Google Scholar
Tao, D., Li, X., Wu, X., Maybank, S.J.: General tensor discriminant analysis and gabor features for gait recognition. IEEE Trans. Patt. Anal. Mach. Intell. 29(10), 1700–1715 (2007)
Article Google Scholar
Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Patt. Anal. Mach. Intell. 28(7), 1088–1099 (2006)
Article Google Scholar
Yao, B., Fei-Fei, L.: Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Trans. Patt. Anal. Mach. Intell. 34(9), 1691–1703 (2012)
Article Google Scholar
Yu, Z., Li, C., Wu, J., Cai, J., Do, M.N., Lu, J.: Action recognition in still images with minimum annotation efforts. IEEE Trans. Image Process. 25(11), 5479–5490 (2016)
Article MathSciNet Google Scholar
Zhao, Z., Ma, H., Chen, X.: Generalized symmetric pair model for action classification in still images. Patt. Recogn. 64, 347–360 (2017)
Article Google Scholar

Download references

Acknowledgments

This work is supported by National High Technology Research and Development Program (under grant No. 2015AA020108) and National Natural Science Foundation of China (under grant No. U1435222).

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, China
Weijiang Feng, Xiang Zhang, Xuhui Huang & Zhigang Luo

Authors

Weijiang Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuhui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xiang Zhang or Zhigang Luo .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, W., Zhang, X., Huang, X., Luo, Z. (2017). Boxless Action Recognition in Still Images via Recurrent Visual Attention. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10635. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_68

Download citation

DOI: https://doi.org/10.1007/978-3-319-70096-0_68
Published: 26 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70095-3
Online ISBN: 978-3-319-70096-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics