Skip to main content

PDAN Light: An Improved Attention Network for Action Detection

  • Conference paper
  • First Online:
Advances and Trends in Artificial Intelligence. Theory and Applications (IEA/AIE 2023)

Abstract

Action detection in densely annotated, untrimmed videos is a challenging and important task, with important implications in practical applications. Not only the right actions must be discovered, but also their start and end times. Recent advances in deep neural networks have pushed forward the action detection capabilities, in particular the I3D network. This paper describes a network with attention, which is based on the I3D features and includes state-of-the-art blocks, namely: MLP-Mixer and Vision Permutator. A light version of the original network is proposed, called PDAN light, which has 22.5% fewer parameters than the original PDAN, while improving the accuracy a 1.98% on average; and the MLP-Mixer-based architecture which has 34.5% fewer parameters than the original PDAN, while improving the accuracy a 0.95% on average. All the code is available in https://github.com/dvidgar/PDAN_light.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akbari, A., Wu, J., et al.: Hierarchical signal segmentation and classification for accurate activity recognition. In: 2018 ACM International Joint Conference (2018)

    Google Scholar 

  2. Batchuluun, G., Kim, J.H., et al.: Fuzzy system based human behavior recognition by combining behavior prediction and recognition. Expert Syst. Appl. 81, 108–133 (2017)

    Article  Google Scholar 

  3. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  4. Dai, R., Das, S., et al.: PDAN: pyramid dilated attention network for action detection. In: CVF Winter Conference on Applications of Computer Vision (2021)

    Google Scholar 

  5. Gumaei, A.H., Hassan, M.M., et al.: A hybrid deep learning model for human activity recognition using multimodal body sensing data. IEEE Access 7, 99152–99160 (2019)

    Article  Google Scholar 

  6. Ha, S., Yun, J.M., Choi, S.: Multi-modal convolutional neural networks for activity recognition. In: 2015 IEEE International Conference on Systems, Man, and Cybernetics, pp. 3017–3022 (2015)

    Google Scholar 

  7. Liu, R., Li, Y., et al.: Are we ready for a new paradigm shift? A survey on visual deep MLP. Patterns 3(7), 100520 (2022)

    Google Scholar 

  8. Lyu, L., He, X., et al.: Privacy-preserving collaborative deep learning with application to human activity recognition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (2017)

    Google Scholar 

  9. Mavroudi, E., Haro, B.B., Vidal, R.: Representation learning on visual-symbolic graphs for video understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12374, pp. 71–90. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58526-6_5

    Chapter  Google Scholar 

  10. Mazzia, V., Angarano, S., et al.: Action transformer: a self-attention model for short-time pose-based human action recognition. Pattern Recognit. 1084–1087 (2021)

    Google Scholar 

  11. Moscholidou, I., Pangbourne, K.: A preliminary assessment of regulatory efforts to steer smart mobility in London and Seattle. Transp. Policy 98, 170–177 (2020)

    Article  Google Scholar 

  12. Piergiovanni, A., Ryoo, M.: Temporal gaussian mixture layer for videos. In: International Conference on Machine learning, pp. 5152–5161. PMLR (2019)

    Google Scholar 

  13. Piergiovanni, A., Ryoo, M.S.: Learning latent super-events to detect multiple activities in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5304–5313 (2018)

    Google Scholar 

  14. Pigou, L., van den Oord, A., et al.: Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int. J. Comput. Vision 126, 430–439 (2016)

    Article  MathSciNet  Google Scholar 

  15. Prati, A., Shan, C., Wang, K.I.K.: Sensors, vision and networks: from video surveillance to activity recognition and health monitoring. J. Ambient Intell. Smart Environ. 11, 5–22 (2019)

    Google Scholar 

  16. Qi, J., Yang, P., Hanneghan, M., et al.: A hybrid hierarchical framework for gym physical activity recognition and measurement using wearable sensors. IEEE Internet Things J. 6, 1384–1393 (2019)

    Article  Google Scholar 

  17. Retuerta, D.G.: Deep learning for computer vision in smart cities. Ph.D. thesis, Department of Computer Science and Automation, Faculty of Science, the University of Salamanca, Salamanca, Spain (2022)

    Google Scholar 

  18. Sigurdsson, G.A., Varol, G., Wang, X., Farhadi, A., Laptev, I., Gupta, A.: Hollywood in homes: crowdsourcing data collection for activity understanding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 510–526. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_31

    Chapter  Google Scholar 

  19. Tolstikhin, I.O., Houlsby, N., et al.: MLP-mixer: an all-MLP architecture for vision. Adv. Neural. Inf. Process. Syst. 34, 24261–24272 (2021)

    Google Scholar 

Download references

Acknowledgements

This research has been supported by “Intelligent and sustainable mobility supported by multi-agent systems and edge computing (InEDGE-Mobility): Towards Sustainable Intelligent Mobility: Blockchain-based framework for IoT Security”, Reference: RTI2018-095390-B-C32, financed by the Spanish Ministry of Science, Innovation and Universities, the State Research Agency and the European Regional Development Fund. This research was also partially supported by Shota Rustaveli National Science Foundation of Georgia (SRNSFG) under the grant YS-19-1633. This work is part of the PhD dissertation of David García Retuerta “Deep Learning for Computer Vision in Smart Cities”, and can be found under Chapter 4 [17].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Garcia-Retuerta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garcia-Retuerta, D., Dundua, B., Dedabrishvili, M. (2023). PDAN Light: An Improved Attention Network for Action Detection. In: Fujita, H., Wang, Y., Xiao, Y., Moonis, A. (eds) Advances and Trends in Artificial Intelligence. Theory and Applications. IEA/AIE 2023. Lecture Notes in Computer Science(), vol 13925. Springer, Cham. https://doi.org/10.1007/978-3-031-36819-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36819-6_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36818-9

  • Online ISBN: 978-3-031-36819-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics