PDAN Light: An Improved Attention Network for Action Detection

Garcia-Retuerta, David; Dundua, Besik; Dedabrishvili, Mariam

doi:10.1007/978-3-031-36819-6_9

David Garcia-Retuerta¹¹,
Besik Dundua^12,13 &
Mariam Dedabrishvili¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13925))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

334 Accesses

Abstract

Action detection in densely annotated, untrimmed videos is a challenging and important task, with important implications in practical applications. Not only the right actions must be discovered, but also their start and end times. Recent advances in deep neural networks have pushed forward the action detection capabilities, in particular the I3D network. This paper describes a network with attention, which is based on the I3D features and includes state-of-the-art blocks, namely: MLP-Mixer and Vision Permutator. A light version of the original network is proposed, called PDAN light, which has 22.5% fewer parameters than the original PDAN, while improving the accuracy a 1.98% on average; and the MLP-Mixer-based architecture which has 34.5% fewer parameters than the original PDAN, while improving the accuracy a 0.95% on average. All the code is available in https://github.com/dvidgar/PDAN_light.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akbari, A., Wu, J., et al.: Hierarchical signal segmentation and classification for accurate activity recognition. In: 2018 ACM International Joint Conference (2018)
Google Scholar
Batchuluun, G., Kim, J.H., et al.: Fuzzy system based human behavior recognition by combining behavior prediction and recognition. Expert Syst. Appl. 81, 108–133 (2017)
Article Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Dai, R., Das, S., et al.: PDAN: pyramid dilated attention network for action detection. In: CVF Winter Conference on Applications of Computer Vision (2021)
Google Scholar
Gumaei, A.H., Hassan, M.M., et al.: A hybrid deep learning model for human activity recognition using multimodal body sensing data. IEEE Access 7, 99152–99160 (2019)
Article Google Scholar
Ha, S., Yun, J.M., Choi, S.: Multi-modal convolutional neural networks for activity recognition. In: 2015 IEEE International Conference on Systems, Man, and Cybernetics, pp. 3017–3022 (2015)
Google Scholar
Liu, R., Li, Y., et al.: Are we ready for a new paradigm shift? A survey on visual deep MLP. Patterns 3(7), 100520 (2022)
Google Scholar
Lyu, L., He, X., et al.: Privacy-preserving collaborative deep learning with application to human activity recognition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (2017)
Google Scholar
Mavroudi, E., Haro, B.B., Vidal, R.: Representation learning on visual-symbolic graphs for video understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12374, pp. 71–90. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58526-6_5
Chapter Google Scholar
Mazzia, V., Angarano, S., et al.: Action transformer: a self-attention model for short-time pose-based human action recognition. Pattern Recognit. 1084–1087 (2021)
Google Scholar
Moscholidou, I., Pangbourne, K.: A preliminary assessment of regulatory efforts to steer smart mobility in London and Seattle. Transp. Policy 98, 170–177 (2020)
Article Google Scholar
Piergiovanni, A., Ryoo, M.: Temporal gaussian mixture layer for videos. In: International Conference on Machine learning, pp. 5152–5161. PMLR (2019)
Google Scholar
Piergiovanni, A., Ryoo, M.S.: Learning latent super-events to detect multiple activities in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5304–5313 (2018)
Google Scholar
Pigou, L., van den Oord, A., et al.: Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int. J. Comput. Vision 126, 430–439 (2016)
Article MathSciNet Google Scholar
Prati, A., Shan, C., Wang, K.I.K.: Sensors, vision and networks: from video surveillance to activity recognition and health monitoring. J. Ambient Intell. Smart Environ. 11, 5–22 (2019)
Google Scholar
Qi, J., Yang, P., Hanneghan, M., et al.: A hybrid hierarchical framework for gym physical activity recognition and measurement using wearable sensors. IEEE Internet Things J. 6, 1384–1393 (2019)
Article Google Scholar
Retuerta, D.G.: Deep learning for computer vision in smart cities. Ph.D. thesis, Department of Computer Science and Automation, Faculty of Science, the University of Salamanca, Salamanca, Spain (2022)
Google Scholar
Sigurdsson, G.A., Varol, G., Wang, X., Farhadi, A., Laptev, I., Gupta, A.: Hollywood in homes: crowdsourcing data collection for activity understanding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 510–526. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_31
Chapter Google Scholar
Tolstikhin, I.O., Houlsby, N., et al.: MLP-mixer: an all-MLP architecture for vision. Adv. Neural. Inf. Process. Syst. 34, 24261–24272 (2021)
Google Scholar

Download references

Acknowledgements

This research has been supported by “Intelligent and sustainable mobility supported by multi-agent systems and edge computing (InEDGE-Mobility): Towards Sustainable Intelligent Mobility: Blockchain-based framework for IoT Security”, Reference: RTI2018-095390-B-C32, financed by the Spanish Ministry of Science, Innovation and Universities, the State Research Agency and the European Regional Development Fund. This research was also partially supported by Shota Rustaveli National Science Foundation of Georgia (SRNSFG) under the grant YS-19-1633. This work is part of the PhD dissertation of David García Retuerta “Deep Learning for Computer Vision in Smart Cities”, and can be found under Chapter 4 [17].

Author information

Authors and Affiliations

University of Salamanca, Salamanca, Spain
David Garcia-Retuerta
Tbilisi State University, Tbilisi, Georgia
Besik Dundua
Kutaisi International University, Kutaisi, Georgia
Besik Dundua
International Black Sea University, Tbilisi, Georgia
Mariam Dedabrishvili

Authors

David Garcia-Retuerta
View author publications
You can also search for this author in PubMed Google Scholar
Besik Dundua
View author publications
You can also search for this author in PubMed Google Scholar
Mariam Dedabrishvili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Garcia-Retuerta .

Editor information

Editors and Affiliations

Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
Hamido Fujita
Shanghai University of Finance and Economics, Shanghai, China
Yinglin Wang
Fudan University, Shanghai, China
Yanghua Xiao
Texas State University, San Marcos, TX, USA
Ali Moonis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garcia-Retuerta, D., Dundua, B., Dedabrishvili, M. (2023). PDAN Light: An Improved Attention Network for Action Detection. In: Fujita, H., Wang, Y., Xiao, Y., Moonis, A. (eds) Advances and Trends in Artificial Intelligence. Theory and Applications. IEA/AIE 2023. Lecture Notes in Computer Science(), vol 13925. Springer, Cham. https://doi.org/10.1007/978-3-031-36819-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-36819-6_9
Published: 19 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36818-9
Online ISBN: 978-3-031-36819-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PDAN Light: An Improved Attention Network for Action Detection