A novel algorithm for human action recognition in compressed domain using attention-guided approach

Praveenkumar, S. M.; Patil, Prakashgoud; Hiremath, P. S.

doi:10.1007/s11554-023-01374-9

A novel algorithm for human action recognition in compressed domain using attention-guided approach

Research
Published: 06 November 2023

Volume 20, article number 122, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

S. M. Praveenkumar¹,
Prakashgoud Patil¹^na1 &
P. S. Hiremath¹^na1

188 Accesses
1 Citation
Explore all metrics

Abstract

Herein, a novel methodology is proposed for real-time human activity detection and recognition in a compressed domain of videos using motion vectors and attention-guided bidirectional LSTM, and it is termed as MVABLSTM. The videos in MPEG-4 and H.264 compression formats are considered for the present study. Any video source without any prior setup could be considered by adopting the proposed method to various video codecs and camera settings. Existing algorithms for human action recognition in a compressed domain video have some limitations in this regard, such as (i) requirement of keyframes at a fixed interval, (ii) usage of P-frames only, and (iii) normally support single codec only. These limitations are overcome in the proposed method using arbitrary keyframe intervals, using both P- and B-frames, and supporting MPEG-4 as well as H.264 codecs. The experimentation is carried out using the benchmark datasets, namely UCF101, HMDB51, and THUMOS14, and the recognition accuracy in a compressed domain is found to be comparable to that observed in raw video data but with reduced computational time. The proposed MVABLSTM method has outperformed other recent methods in the literature in terms of a lesser (65%) number of parameters and (92%) GFLOPS, while significantly improving accuracy by 0.8%, 5.95%, and 16.65% for UCF101, HMDB51, and THUMOS14, respectively, and speed by 8% in MPEG-4 domain. The performance analysis of the proposed method has been done using MVABLSTM variants in different codecs in comparison with the state-of-the-art network models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Visual attention network

Article Open access 28 July 2023

Data availability

The data used in this paper are taken from the publicly available benchmark datasets.

References

Wu, C.-Y., Zaheer, M., Hu, H., Manmatha, R., Smola, A.J., Krähenbühl, P.: Compressed video action recognition. In: CVPR, (2018)
Bommes, L., Lin, X., Zhou, J.: MVmed: fast multi-object tracking in the compressed domain. In: 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 1419-1424 (2020), https://doi.org/10.1109/ICIEA48937.2020.9248145
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: 2011 Int. Conf. Comput. Vis., pp. 2556–2563, IEEE, (2011)
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. tech. rep., University of Central Florida, (2012)
Jiang, Y.-G., Liu, J., Roshan Zamir, A., Toderici, G., Laptev, I., Shah, M., Sukthankar, R.: THUMOS challenge: action recognition with a large number of classes. http://crcv.ucf.edu/THUMOS14/, (2014). Accessed July 2022
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Neurips, pp. 568–576, (2014)
Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4724–4733 (2017). https://doi.org/10.1109/CVPR.2017.502
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017). https://doi.org/10.1109/TPAMI.2016.2599174
Article Google Scholar
Gao, Z., Guo, L., Guan, W., Liu, A.-A., Ren, T., Chen, S.: A pairwise attentive adversarial spatiotemporal network for cross-domain few-shot action recognition-R2. IEEE Trans. Image Process. 30, 767–782 (2021)
Article Google Scholar
Gao, Z., Guo, L., Ren, T., Liu, A.-A., Cheng, Z.-Y., Chen, S.: Pairwise two-stream ConvNets for cross-domain action recognition with small data. IEEE Trans. Neural Netw. Learn. Syst. 33(3), 1147–1161 (2022). https://doi.org/10.1109/TNNLS.2020.3041018
Article Google Scholar
Liu, T., Lam, K.-M., Zhao, R., Kong, J.: Enhanced attention tracking with multi-branch network for egocentric activity recognition. IEEE Trans. Circ. Syst. Vid. Technol. 32(6), 3587–3602 (2022). https://doi.org/10.1109/TCSVT.2021.3104651
Article Google Scholar
Liu, T., Zhao, R., Jia, W., Lam, K.-M., Kong, J.: Holistic-guided disentangled learning with cross-video semantics mining for concurrent first-person and third-person activity recognition. IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2022.3202835
Article Google Scholar
Zhao, Y., et al.: A temporal-aware relation and attention network for temporal action localization. IEEE Trans. Image Process. 31, 4746–4760 (2022). https://doi.org/10.1109/TIP.2022.3182866
Article Google Scholar
Babu, R.V., Tom, M., Wadekar, P.: A survey on compressed domain video analysis techniques. Multimed. Tools Appl. 75, 1043–1078 (2016). https://doi.org/10.1007/s11042-014-2345
Article Google Scholar
Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with deeply transferred motion vector CNNs. IEEE Trans. Image Process. 27, 2326–2339 (2018)
Article MathSciNet Google Scholar
Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with enhanced motion vector CNNs. In: CVPR, (2016)
Shou, Z., Yan, Z., Kalantidis, Y., Sevilla-Lara, L., Rohrbach, M., Lin, X., Chang, S.-F.: DMC-Net: generating discriminative motion cues for fast compressed video action recognition. tech. rep., Columbia Univ. & Facebook, (2019)
Huo, Y., Xu, X., Lu, Y., Niu, Y., Lu, Z., Wen, J.-R.: Mobile video action recognition. tech. rep., (2019)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 4510–4520, IEEE Computer Society, (2018)
Wang, J., Torresani, L.: Deformable video transformer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14033–14042 (2022). https://doi.org/10.1109/CVPR52688.2022.01366
Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., Baik, S.W.: Action recognition in video sequences using deep Bi-Directional LSTM with CNN features. IEEE Access 6, 1155–1166 (2018). https://doi.org/10.1109/ACCESS.2017.2778011
Article Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Log. Q. 2, 83–97 (1955)
Article MathSciNet MATH Google Scholar
Wu, W., Wang, X., Luo, H., Wang, J., Yang, Y., Ouyang, W.: Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models. In: CVPR (2023)
Gowda, S.N., Rohrbach, M., Sevilla-Lara, L.: SMART frame selection for action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence 35(2), 1451–1459 (2021)
Li, Y., Lu, Z., Xiong, X., Huang, J.: PERF-Net: pose empowered RGB-Flow Net. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2022, pp. 798–807 (2022), https://doi.org/10.1109/WACV51458.2022.00087
Wu, W., Sun, Z., Ouyang, W.: Revisiting classifier: transferring vision-language models for video recognition. In: AAAI Conference on Artificial Intelligence (AAAI), (2023)
Qiu, Z., Yao, T., Ngo, C.-W., Tian, X., Mei, T.: Learning spatio-temporal representation with local and global diffusion. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 12048-12057, (2019), https://doi.org/10.1109/CVPR.2019.01233.
Liu, Y., Ma, L., Zhang, Y., Liu, W., Chang, S.-F.: Multi-granularity generator for temporal action proposal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3604–3613, (2019)
Xu, H., Das, A., Saenko, K.: Two-stream region convolutional 3D network for temporal activity detection. IEEE Trans. Pattern Anal. Mach. Intell. 41(10), 2319–2332 (2019). https://doi.org/10.1109/TPAMI.2019.2921539
Article Google Scholar
Lin, Ti., Liu, X., Li, X., Ding, E., Wen, S.: BMN: boundary-matching network for temporal action proposal generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3889–3898, (2019)
Wang, L., Huang, B., Zhao, Z., Tong, Z., He, Y., Wang, Y., Wang, Y., Qiao, Y.: VideoMAE V2: scaling video masked autoencoders with dual masking. In: Submitted on 29 Mar 2023 to Computer Vision and Pattern Recognition
Battash, B., Barad, H., Tang, H., Bleiweiss, A.: Mimic the raw domain: accelerating action recognition in the compressed domain. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2926–2934 (2020). https://doi.org/10.1109/CVPRW50498.2020.00350
Jain, M., van Gemert, J.C., Snoek, C.G.: What do 15,000 object categories tell us about classifying and localizing actions? In: CVPR’15, pp. 46–55, (2015)
Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance features. In: THUMOS Action Recognition challenge, (2014)
Susan Milton, J., Arnold, J.C.: Introduction to Probability and Statistics, 4th edn. McGraw Hill (2007)
Google Scholar

Download references

Acknowledgements

The authors are indebted to the Reviewers for their helpful comments and suggestions, which greatly improved the quality of the paper.

Author information

Prakashgoud Patil and P. S Hiremath have contributed equally to this work.

Authors and Affiliations

Department of Master of Computer Applications, KLE Technological University, Vidayanagr, Hubballi, Karnataka, 580031, India
S. M. Praveenkumar, Prakashgoud Patil & P. S. Hiremath

Authors

S. M. Praveenkumar
View author publications
You can also search for this author in PubMed Google Scholar
Prakashgoud Patil
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Hiremath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. M. Praveenkumar.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Praveenkumar, S.M., Patil, P. & Hiremath, P.S. A novel algorithm for human action recognition in compressed domain using attention-guided approach. J Real-Time Image Proc 20, 122 (2023). https://doi.org/10.1007/s11554-023-01374-9

Download citation

Received: 07 July 2023
Accepted: 29 September 2023
Published: 06 November 2023
DOI: https://doi.org/10.1007/s11554-023-01374-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel algorithm for human action recognition in compressed domain using attention-guided approach

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Video summarization using deep learning techniques: a detailed analysis and investigation

Visual attention network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel algorithm for human action recognition in compressed domain using attention-guided approach

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Video summarization using deep learning techniques: a detailed analysis and investigation

Visual attention network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation