skip to main content
10.1145/3573942.3574089acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaiprConference Proceedingsconference-collections
research-article

A Weight Initialization Method for Compressed Video Action Recognition in Compressed Domain

Published: 16 May 2023 Publication History

Abstract

The exponential evolution of big data with its increasing volumes, especially when it comes to videos from smart devices and video sites, has become a real challenge to video analysis tasks algorithms. Processing and storage difficulties are the main problems for these traditional video processing architectures that mostly use RGB frames for video analysis tasks. The process of decoding compressed videos is time-consuming and requires a lot of storage space. Although existing convolutional neural networks (CNNs) based video analysis architectures have realized notable advancements, they still hardly meet the requirements of many real-time scenarios and real-world applications. This is one of the motivations for the computer vision community to move to action recognition with compressed domain compressed videos in order to overcome the aforementioned issues. On the other hand, the performance of prominent methods is very dependent on the correct setting of initialization parameters. The choice of initialization has an impact on the final generalization performance of a neural network. This work proposes a weight initialization technique in compressed domain for compressed videos action recognition tasks. Our approach was tested on UFC-101 and HDBM-51 datasets. The performance evaluation shows the effectiveness of our proposed methodology.

References

[1]
Yuqi Huo, 2020. Lightweight Action Recognition in Compressed Videos. In European Conference on Computer Vision, 337-352.
[2]
Joao Carreira, Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6299-63.
[3]
Chao-Yuan Wu, "Compressed video action recognition. 2018. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6026-6035.
[4]
Forecast, C.V.: Cisco visual networking index: Forecast and trends, 2017-2022.White paper, Cisco Public Information (2019).
[5]
Bowen Zhang, 2018. Real-time action recognition with deeply transferred motion vector cnns. IEEE Transactions on Image Processing 27(5), 2326-2339.
[6]
Andrej Karpathy, 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 1725-1732.
[7]
Tianwei Lin, 2018. Bsn: Boundary sensitive network for temporal action proposal generation. In Proceedings of the European conference on computer vision (ECCV), 3-19.
[8]
Siyang Li, 2018. Unsupervised video object segmentation with motion-based bilateral networks. In Proceedings of the European conference on computer vision (ECCV), 207-223.
[9]
Mingyu Ding, 2019. Face-focused cross-stream network for deception detection in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7802-7811.
[10]
Maryam Asadi-Aghbolaghi, 2017. A survey on deep learning based approaches for action and gesture recognition in image sequences. In 12th IEEE international conference on automatic face & gesture recognition (FG 2017), 476-483.
[11]
Ionut Cosmin Duta, 2017. Spatio-temporal vlad encoding for human action recognition in videos. In International Conference on Multimedia Modeling. Springer, Cham, 365-378.
[12]
Ionut Cosmin Duta, 2017. Spatio-temporal vector of locally max pooled features for action recognition in videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3205-3214.
[13]
Stefan Falkner, Aaron Klein, and Frank Hutter. 2018. BOHB: Robust and efficient hyperparameter optimization at scale.In International Conference on Machine Learning. PMLR, 1437-1446.
[14]
Devansh Arpit, Víctor Campos, and Yoshua Bengio. 2019. How to initialize your network? robust initialization for weightnorm & resnets. Advances in Neural Information Processing Systems 32.
[15]
Boris Hanin. 2018. Which neural net architectures give rise to exploding and vanishing gradients?.Advances in neural information processing systems 31.
[16]
Nitish Shirish Keskar, 2016. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836.
[17]
Rong Ge, Chi Jin, and Yi Zheng. 2017. No spurious local minima in nonconvex low rank problems: A unified geometric analysis. International Conference on Machine Learning. PMLR, 1233-1242.
[18]
Devansh Arpit, 2017. A closer look at memorization in deep networks. In International conference on machine learning. PMLR, 233-242.
[19]
Chi Jin, 2017. How to escape saddle points efficiently.In International Conference on Machine Learning. PMLR, 1724-1732.
[20]
Francisco Madrigal, Maurice Camille, and Lerasle Frédéric. 2019. Hyper-parameter optimization tools comparison for multiple object tracking applications. Machine Vision and Applications 30(2), 269-289.
[21]
Ilhem Boussaïd, Lepagnot Julien, and Siarry Patrick. 2013. A survey on optimization metaheuristics. Information sciences, 237, 82-117.
[22]
Pazhaniraja, N., 2017. A study on recent bio-inspired optimization algorithms. In IEEE Fourth International Conference on Signal Processing, Communication and Networking (ICSCN), 1-6.
[23]
Qin Cheng, 2021. Multiple Time Scale Motion Images for Action Recognition. In IEEE International Conference on E-health Networking, Application & Services (HEALTHCOM), 1-5.
[24]
Mahmoud Al-Faris, 2020. A review on computer vision-based methods for human action recognition. Journal of imaging 6(6): 46.
[25]
Sami Jaballah and Mohamed-Chaker Larabi. 2019. Fast Object Detection in H264/AVC and HEVC Compressed Domains for Video Surveillance. In Processing of the 8th European Workshop on Visual Information Processing (EUVIP), 123-128.
[26]
Fari Zaki, Amr E. Mohamed Amr E S, Samir G. Sayed. 2021. CtuNet: A Deep Learning-based Framework for Fast CTU Partitioning of H265/HEVC Intra- coding. Ain Shams Engineering Journal, 12(2), 1859-1866.
[27]
Pavan Sandula, and Okade Manish. 2022. A novel video saliency estimation method in the compressed domain. Pattern Analysis and Applications, 1-12.
[28]
Bowen Zhang, 2028. Real-time action recognition with deeply transferred motion vector cnns. IEEE Transactions on Image Processing 27(5), 2326-2339.

Index Terms

  1. A Weight Initialization Method for Compressed Video Action Recognition in Compressed Domain

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition
    September 2022
    1221 pages
    ISBN:9781450396899
    DOI:10.1145/3573942
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    AIPR 2022

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 31
      Total Downloads
    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media