research-article

A Weight Initialization Method for Compressed Video Action Recognition in Compressed Domain

Authors:

Allah Rakhio Junejo,

Zhuoming LiAuthors Info & Claims

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

Pages 740 - 745

https://doi.org/10.1145/3573942.3574089

Published: 16 May 2023 Publication History

Abstract

The exponential evolution of big data with its increasing volumes, especially when it comes to videos from smart devices and video sites, has become a real challenge to video analysis tasks algorithms. Processing and storage difficulties are the main problems for these traditional video processing architectures that mostly use RGB frames for video analysis tasks. The process of decoding compressed videos is time-consuming and requires a lot of storage space. Although existing convolutional neural networks (CNNs) based video analysis architectures have realized notable advancements, they still hardly meet the requirements of many real-time scenarios and real-world applications. This is one of the motivations for the computer vision community to move to action recognition with compressed domain compressed videos in order to overcome the aforementioned issues. On the other hand, the performance of prominent methods is very dependent on the correct setting of initialization parameters. The choice of initialization has an impact on the final generalization performance of a neural network. This work proposes a weight initialization technique in compressed domain for compressed videos action recognition tasks. Our approach was tested on UFC-101 and HDBM-51 datasets. The performance evaluation shows the effectiveness of our proposed methodology.

References

[1]

Yuqi Huo, 2020. Lightweight Action Recognition in Compressed Videos. In European Conference on Computer Vision, 337-352.

[2]

Joao Carreira, Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6299-63.

[3]

Chao-Yuan Wu, "Compressed video action recognition. 2018. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6026-6035.

[4]

Forecast, C.V.: Cisco visual networking index: Forecast and trends, 2017-2022.White paper, Cisco Public Information (2019).

[5]

Bowen Zhang, 2018. Real-time action recognition with deeply transferred motion vector cnns. IEEE Transactions on Image Processing 27(5), 2326-2339.

Digital Library

[6]

Andrej Karpathy, 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 1725-1732.

Digital Library

[7]

Tianwei Lin, 2018. Bsn: Boundary sensitive network for temporal action proposal generation. In Proceedings of the European conference on computer vision (ECCV), 3-19.

Digital Library

[8]

Siyang Li, 2018. Unsupervised video object segmentation with motion-based bilateral networks. In Proceedings of the European conference on computer vision (ECCV), 207-223.

Digital Library

[9]

Mingyu Ding, 2019. Face-focused cross-stream network for deception detection in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7802-7811.

[10]

Maryam Asadi-Aghbolaghi, 2017. A survey on deep learning based approaches for action and gesture recognition in image sequences. In 12th IEEE international conference on automatic face & gesture recognition (FG 2017), 476-483.

Digital Library

[11]

Ionut Cosmin Duta, 2017. Spatio-temporal vlad encoding for human action recognition in videos. In International Conference on Multimedia Modeling. Springer, Cham, 365-378.

[12]

Ionut Cosmin Duta, 2017. Spatio-temporal vector of locally max pooled features for action recognition in videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3205-3214.

[13]

Stefan Falkner, Aaron Klein, and Frank Hutter. 2018. BOHB: Robust and efficient hyperparameter optimization at scale.In International Conference on Machine Learning. PMLR, 1437-1446.

[14]

Devansh Arpit, Víctor Campos, and Yoshua Bengio. 2019. How to initialize your network? robust initialization for weightnorm & resnets. Advances in Neural Information Processing Systems 32.

[15]

Boris Hanin. 2018. Which neural net architectures give rise to exploding and vanishing gradients?.Advances in neural information processing systems 31.

[16]

Nitish Shirish Keskar, 2016. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836.

[17]

Rong Ge, Chi Jin, and Yi Zheng. 2017. No spurious local minima in nonconvex low rank problems: A unified geometric analysis. International Conference on Machine Learning. PMLR, 1233-1242.

[18]

Devansh Arpit, 2017. A closer look at memorization in deep networks. In International conference on machine learning. PMLR, 233-242.

[19]

Chi Jin, 2017. How to escape saddle points efficiently.In International Conference on Machine Learning. PMLR, 1724-1732.

[20]

Francisco Madrigal, Maurice Camille, and Lerasle Frédéric. 2019. Hyper-parameter optimization tools comparison for multiple object tracking applications. Machine Vision and Applications 30(2), 269-289.

Digital Library

[21]

Ilhem Boussaïd, Lepagnot Julien, and Siarry Patrick. 2013. A survey on optimization metaheuristics. Information sciences, 237, 82-117.

[22]

Pazhaniraja, N., 2017. A study on recent bio-inspired optimization algorithms. In IEEE Fourth International Conference on Signal Processing, Communication and Networking (ICSCN), 1-6.

[23]

Qin Cheng, 2021. Multiple Time Scale Motion Images for Action Recognition. In IEEE International Conference on E-health Networking, Application & Services (HEALTHCOM), 1-5.

[24]

Mahmoud Al-Faris, 2020. A review on computer vision-based methods for human action recognition. Journal of imaging 6(6): 46.

[25]

Sami Jaballah and Mohamed-Chaker Larabi. 2019. Fast Object Detection in H264/AVC and HEVC Compressed Domains for Video Surveillance. In Processing of the 8th European Workshop on Visual Information Processing (EUVIP), 123-128.

[26]

Fari Zaki, Amr E. Mohamed Amr E S, Samir G. Sayed. 2021. CtuNet: A Deep Learning-based Framework for Fast CTU Partitioning of H265/HEVC Intra- coding. Ain Shams Engineering Journal, 12(2), 1859-1866.

[27]

Pavan Sandula, and Okade Manish. 2022. A novel video saliency estimation method in the compressed domain. Pattern Analysis and Applications, 1-12.

[28]

Bowen Zhang, 2028. Real-time action recognition with deeply transferred motion vector cnns. IEEE Transactions on Image Processing 27(5), 2326-2339.

Digital Library

Index Terms

A Weight Initialization Method for Compressed Video Action Recognition in Compressed Domain
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Compressed-domain techniques for error-resilient video transcoding using RPS

In video applications where video sequences are compressed and stored in a storage device for future delivery, the encoding process is typically carried out without enough prior knowledge about the channel characteristics of a network. Error-resilient ...
Transcoding of mpeg compressed video
Lossless fragile watermarking algorithm in compressed domain for multiview video coding

The hierarchical B picture (HBP) prediction structure is a typical coding scheme used for multiview video coding (MVC). This paper proposes a fragile watermarking algorithm for HBP-based multiview video coding. B_DIRECT_16 16 and B_SKIP are two types of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 2022

1221 pages

ISBN:9781450396899

DOI:10.1145/3573942

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

AIPR 2022

AIPR 2022: 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

September 23 - 25, 2022

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
31
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)4

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten