Abstract
Action detection in densely annotated, untrimmed videos is a challenging and important task, with important implications in practical applications. Not only the right actions must be discovered, but also their start and end times. Recent advances in deep neural networks have pushed forward the action detection capabilities, in particular the I3D network. This paper describes a network with attention, which is based on the I3D features and includes state-of-the-art blocks, namely: MLP-Mixer and Vision Permutator. A light version of the original network is proposed, called PDAN light, which has 22.5% fewer parameters than the original PDAN, while improving the accuracy a 1.98% on average; and the MLP-Mixer-based architecture which has 34.5% fewer parameters than the original PDAN, while improving the accuracy a 0.95% on average. All the code is available in https://github.com/dvidgar/PDAN_light.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akbari, A., Wu, J., et al.: Hierarchical signal segmentation and classification for accurate activity recognition. In: 2018 ACM International Joint Conference (2018)
Batchuluun, G., Kim, J.H., et al.: Fuzzy system based human behavior recognition by combining behavior prediction and recognition. Expert Syst. Appl. 81, 108–133 (2017)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Dai, R., Das, S., et al.: PDAN: pyramid dilated attention network for action detection. In: CVF Winter Conference on Applications of Computer Vision (2021)
Gumaei, A.H., Hassan, M.M., et al.: A hybrid deep learning model for human activity recognition using multimodal body sensing data. IEEE Access 7, 99152–99160 (2019)
Ha, S., Yun, J.M., Choi, S.: Multi-modal convolutional neural networks for activity recognition. In: 2015 IEEE International Conference on Systems, Man, and Cybernetics, pp. 3017–3022 (2015)
Liu, R., Li, Y., et al.: Are we ready for a new paradigm shift? A survey on visual deep MLP. Patterns 3(7), 100520 (2022)
Lyu, L., He, X., et al.: Privacy-preserving collaborative deep learning with application to human activity recognition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (2017)
Mavroudi, E., Haro, B.B., Vidal, R.: Representation learning on visual-symbolic graphs for video understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12374, pp. 71–90. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58526-6_5
Mazzia, V., Angarano, S., et al.: Action transformer: a self-attention model for short-time pose-based human action recognition. Pattern Recognit. 1084–1087 (2021)
Moscholidou, I., Pangbourne, K.: A preliminary assessment of regulatory efforts to steer smart mobility in London and Seattle. Transp. Policy 98, 170–177 (2020)
Piergiovanni, A., Ryoo, M.: Temporal gaussian mixture layer for videos. In: International Conference on Machine learning, pp. 5152–5161. PMLR (2019)
Piergiovanni, A., Ryoo, M.S.: Learning latent super-events to detect multiple activities in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5304–5313 (2018)
Pigou, L., van den Oord, A., et al.: Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int. J. Comput. Vision 126, 430–439 (2016)
Prati, A., Shan, C., Wang, K.I.K.: Sensors, vision and networks: from video surveillance to activity recognition and health monitoring. J. Ambient Intell. Smart Environ. 11, 5–22 (2019)
Qi, J., Yang, P., Hanneghan, M., et al.: A hybrid hierarchical framework for gym physical activity recognition and measurement using wearable sensors. IEEE Internet Things J. 6, 1384–1393 (2019)
Retuerta, D.G.: Deep learning for computer vision in smart cities. Ph.D. thesis, Department of Computer Science and Automation, Faculty of Science, the University of Salamanca, Salamanca, Spain (2022)
Sigurdsson, G.A., Varol, G., Wang, X., Farhadi, A., Laptev, I., Gupta, A.: Hollywood in homes: crowdsourcing data collection for activity understanding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 510–526. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_31
Tolstikhin, I.O., Houlsby, N., et al.: MLP-mixer: an all-MLP architecture for vision. Adv. Neural. Inf. Process. Syst. 34, 24261–24272 (2021)
Acknowledgements
This research has been supported by “Intelligent and sustainable mobility supported by multi-agent systems and edge computing (InEDGE-Mobility): Towards Sustainable Intelligent Mobility: Blockchain-based framework for IoT Security”, Reference: RTI2018-095390-B-C32, financed by the Spanish Ministry of Science, Innovation and Universities, the State Research Agency and the European Regional Development Fund. This research was also partially supported by Shota Rustaveli National Science Foundation of Georgia (SRNSFG) under the grant YS-19-1633. This work is part of the PhD dissertation of David García Retuerta “Deep Learning for Computer Vision in Smart Cities”, and can be found under Chapter 4 [17].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Garcia-Retuerta, D., Dundua, B., Dedabrishvili, M. (2023). PDAN Light: An Improved Attention Network for Action Detection. In: Fujita, H., Wang, Y., Xiao, Y., Moonis, A. (eds) Advances and Trends in Artificial Intelligence. Theory and Applications. IEA/AIE 2023. Lecture Notes in Computer Science(), vol 13925. Springer, Cham. https://doi.org/10.1007/978-3-031-36819-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-36819-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36818-9
Online ISBN: 978-3-031-36819-6
eBook Packages: Computer ScienceComputer Science (R0)