Skip to main content

What and Where to See: Deep Attention Aggregation Network for Action Detection

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2022)

Abstract

With the development of deep convolutional neural networks, 2D CNN is widely used in action detection task. Although 2D CNN extracts rich features from video frames, these features also contain redundant information. In response to this problem, we propose Residual Channel-Spatial Attention module (RCSA) to guide the network what (object patterns) and where (spatially) need to be focused. Meanwhile, in order to effectively utilize the rich spatial and semantic features extracted by different layers of deep networks, we combine RCSA and deep aggregation network to propose Deep Attention Aggregation Network. Experiment resultes on two datasets J-HMDB and UCF-101 show that the proposed network achieves state-of-the-art performances on action detection.

Y. He and X. Liu—Contributing author(s).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, Z., Li, J., Wang, S., Wang, J., Ma, L.: Flexible gait transition for six wheel-legged robot with unstructured terrains. Robot. Auton. Syst. 150, 103989 (2022)

    Article  Google Scholar 

  2. Chen, Z., et al.: Control strategy of stable walking for a hexapod wheel-legged robot. ISA Trans. 108, 367–380 (2021)

    Article  Google Scholar 

  3. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. PP(99) (2017)

    Google Scholar 

  4. Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: IEEE International Conference on Computer Vision (2014)

    Google Scholar 

  5. Kalogeiton, V., Weinzaepfel, P., Ferrari, V., Schmid, C.: Action tubelet detector for spatio-temporal action localization. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017, pp. 4415–4423 (2017). https://doi.org/10.1109/ICCV.2017.472

  6. Köpüklü, O., Wei, X., Rigoll, G.: You only watch once: a unified CNN architecture for real-time spatiotemporal action localization. CoRR abs/1911.06644 (2019)

    Google Scholar 

  7. Li, J., Wang, J., Peng, H., Hu, Y., Su, H.: Fuzzy-torque approximation-enhanced sliding mode control for lateral stability of mobile robot. IEEE Trans. Syst. Man Cybern. Syst. 52(4), 2491–2500 (2022). https://doi.org/10.1109/TSMC.2021.3050616

    Article  Google Scholar 

  8. Li, J., Wang, J., Peng, H., Zhang, L., Hu, Y., Su, H.: Neural fuzzy approximation enhanced autonomous tracking control of the wheel-legged robot under uncertain physical interaction. Neurocomputing 410, 342–353 (2020)

    Article  Google Scholar 

  9. Li, J., Wang, J., Wang, S., Yang, C.: Human-robot skill transmission for mobile robot via learning by demonstration. Neural Computing and Applications pp. 1–11 (2021). https://doi.org/10.1007/s00521-021-06449-x

  10. Li, J., Qin, H., Wang, J., Li, J.: Openstreetmap-based autonomous navigation for the four wheel-legged robot via 3D-lidar and CCD camera. IEEE Trans. Industr. Electron. 69(3), 2708–2717 (2022). https://doi.org/10.1109/TIE.2021.3070508

    Article  Google Scholar 

  11. Li, J., Zhang, X., Li, J., Liu, Y., Wang, J.: Building and optimization of 3d semantic map based on lidar and camera fusion. Neurocomputing 409, 394–407 (2020)

    Article  Google Scholar 

  12. Li, Y., Wang, Z., Wang, L., Wu, G.: Actions as moving points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 68–84. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_5

    Chapter  Google Scholar 

  13. Peng, X., Schmid, C.: Multi-region two-stream R-CNN for action detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 744–759. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_45

    Chapter  Google Scholar 

  14. Saha, S., Singh, G., Sapienza, M., Torr, P.H.S., Cuzzolin, F.: Deep learning for detecting multiple space-time action tubes in videos. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, 19–22 September 2016 (2016)

    Google Scholar 

  15. Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. Comput. Ence (2012)

    Google Scholar 

  16. Wang, X., Girshick, R.B., Gupta, A., He, K.: Non-local neural networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 7794–7803 (2018). https://doi.org/10.1109/CVPR.2018.00813

  17. Wei, J., Wang, H., Yi, Y., Li, Q., Huang, D.: P3d-CTN: pseudo-3D convolutional tube network for spatio-temporal action detection in videos. In: 2019 IEEE International Conference on Image Processing (ICIP) (2019)

    Google Scholar 

  18. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  19. Yang, X., Yang, X., Liu, M., Xiao, F., Davis, L.S., Kautz, J.: STEP: spatio-temporal progressive learning for video action detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019. pp. 264–272 (2019). https://doi.org/10.1109/CVPR.2019.00035

  20. Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  21. Zhao, J., Snoek, C.G.M.: Dance with flow: two-in-one stream action detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  22. Zheng, H., Fu, J., Zha, Z., Luo, J.: Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 5012–5021 (2019). https://doi.org/10.1109/CVPR.2019.00515

  23. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. CoRR abs/1904.07850 (2019)

    Google Scholar 

Download references

Acknowledgments

This work is supposed by the National Key R &D Program of China under Grant 2020YFB1708500

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming-Gang Gan .

Editor information

Editors and Affiliations

Ethics declarations

Declarations

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, Y., Gan, MG., Liu, X. (2022). What and Where to See: Deep Attention Aggregation Network for Action Detection. In: Liu, H., et al. Intelligent Robotics and Applications. ICIRA 2022. Lecture Notes in Computer Science(), vol 13455. Springer, Cham. https://doi.org/10.1007/978-3-031-13844-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13844-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13843-0

  • Online ISBN: 978-3-031-13844-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics