ABSTRACT
Deep-learning-based video analysis solutions have become indispensable components in today's intelligent sensing applications. In a networked camera system, an efficient way to analyze the captured videos is to extract the features for deep learning at local cameras or edge devices and then transmit the features to powerful processing hubs for further analysis. As there exists substantial redundancy among different feature maps from the same video frame, the feature maps could be compressed before transmission to save bandwidth. This paper introduces a new rate-distortion optimized framework for compressing the intermediate deep features from the key frames of a video. First, to reduce the redundancy among different features, a feature selection strategy is designed based on hierarchical clustering. The selected features are then quantized, repacked as videos, and further compressed using a standardized video encoder. Furthermore, the proposed framework incorporates rate-distortion models that are built for three representative computer vision tasks: image classification, image segmentation, and image retrieval. A corresponding rate-distortion optimization module is designed to enhance the performance of common computer vision tasks under rate constraints. Experimental results show that the proposed deep feature compression framework can boost the compression performance over the standard HEVC video encoder.
- Zhuo Chen, Kui Fan, Shiqi Wang, Lingyu Duan, Weisi Lin, and Alex Chichung Kot. 2019. Toward Intelligent Sensing: Intermediate Deep Feature Compression. IEEE Transactions on Image Processing 29 (2019), 2230--2243. Google ScholarDigital Library
- Zhuo Chen, Kui Fan, Shiqi Wang, Ling-Yu Duan, Weisi Lin, and Alex Kot. 2019. Lossy Intermediate Deep Learning Feature Compression and Evaluation. In Proceedings of the 27th ACM International Conference on Multimedia (Nice, France) (MM '19). Association for Computing Machinery, New York, NY, USA, 2414--2422. Google ScholarDigital Library
- Zhuo Chen, Weisi Lin, Shiqi Wang, Lingyu Duan, and Alex C. Kot. 2018. Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing. arXiv:1809.06196 [cs.MM]Google Scholar
- Hyomin Choi and Ivan V Bajić. 2018. Deep Feature Compression for Collaborative Object Detection. In 2018 25th IEEE International Conference on Image Processing (ICIP). 3743--3747. Google ScholarCross Ref
- Hyomin Choi and Ivan V Bajić. 2018. Near-Lossless Deep Feature Compression for Collaborative Intelligence. In 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP). 1--6. Google ScholarCross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255. Google ScholarCross Ref
- Ling-Yu Duan, Yihang Lou, Yan Bai, Tiejun Huang, Wen Gao, Vijay Chandrasekhar, Jie Lin, Shiqi Wang, and Alex Chichung Kot. 2019. Compact Descriptors for Video Analysis: The Emerging MPEG Standard. IEEE MultiMedia 26, 2 (2019), 44--54. Google ScholarCross Ref
- Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision 88, 2 (2010), 303--338.Google ScholarDigital Library
- Noa Garcia and George Vogiatzis. 2018. Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval. (2018).Google Scholar
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2016. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 1 (2016), 142--158. Google ScholarDigital Library
- Jiuxiang Gu, Jianfei Cai, Gang Wang, and Tsuhan Chen. 2018. Stack-Captioning: Coarse-to-Fine Learning for Image Captioning. arXiv:1709.03376 [cs.CV]Google Scholar
- Sulaiman M.N Hossin M. 2015. A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining and Knowledge Management Process 5, 2 (2015), 01--11. Google ScholarCross Ref
- Ademola Ikusan and Rui Dai. 2021. Rate-Distortion Optimized Hierarchical Deep Feature Compression. In 2021 IEEE International Conference on Multimedia and Expo (ICME). 1--6. Google ScholarCross Ref
- Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. In Computer Vision - ECCV 2008, David Forsyth, Philip Torr, and Andrew Zisserman (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 304--317.Google ScholarDigital Library
- Xuan Jing, Lap-Pui Chau, and Wan-Chi Siu. 2008. Frame Complexity-Based Rate-Quantization Model for H.264/AVC Intraframe Rate Control. IEEE Signal Processing Letters 15 (2008), 373--376. Google ScholarCross Ref
- Lingchao Kong and Rui Dai. 2016. Temporal-fluctuation-reduced video encoding for object detection in wireless surveillance systems. In 2016 IEEE International Symposium on Multimedia (ISM). IEEE, 126--132.Google ScholarCross Ref
- Lingchao Kong and Rui Dai. 2017. Object-detection-based video compression for wireless surveillance systems. IEEE MultiMedia 24, 2 (2017), 76--85.Google ScholarDigital Library
- Lingchao Kong and Rui Dai. 2018. Efficient Video Encoding for Automatic Video Analysis in Distributed Wireless Surveillance Systems. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 3 (2018), 72.Google Scholar
- Jie Lin, Ling-Yu Duan, Shiqi Wang, Yan Bai, Yihang Lou, Vijay Chandrasekhar, Tiejun Huang, Alex Kot, and Wen Gao. 2017. HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval. IEEE Transactions on Multimedia 19, 9 (2017), 1968--1983. Google ScholarDigital Library
- Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2017. Hierarchical Question-Image Co-Attention for Visual Question Answering. arXiv:1606.00061 [cs.CV]Google Scholar
- Mark Sanderson. 2010. Performance Measures Used in Image Information Retrieval. In ImageCLEF, Experimental Evaluation in Visual Information Retrieval, Henning Müller, Paul D. Clough, Thomas Deselaers, and Barbara Caputo (Eds.). Springer, 81--94. Google ScholarCross Ref
- Sinan Saracli, Nurhan Dogan, and Ismet Dogan. 2013. Comparison of hierarchical cluster analysis methods by cophenetic correlation. Journal of Inequalities and Applications 29 (2013), 9. Google ScholarCross Ref
- Evan Shelhamer, Jonathan Long, and Trevor Darrell. 2016. Fully Convolutional Networks for Semantic Segmentation. arXiv:1605.06211 [cs.CV]Google Scholar
- Ran Shi, King Ngi Ngan, and Songnan Li. 2014. Jaccard index compensation for object segmentation evaluation. 2014 IEEE International Conference on Image Processing (ICIP) (2014), 4457--4461.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]Google Scholar
- Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649--1668. Google ScholarDigital Library
- Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2016. Particular object retrieval with integral max-pooling of CNN activations. (2016).Google Scholar
- Tsung-Han Tsai and Chung-Yuan Lin. 2011. Exploring contextual redundancy in improving object-based video coding for video sensor networks surveillance. IEEE Transactions on Multimedia 14, 3 (2011), 669--682.Google ScholarDigital Library
- Gang Wang, Bo Li, Yongfei Zhang, and Jinhui Yang. 2018. Background modeling and referencing for moving cameras-captured surveillance video coding in HEVC. IEEE Transactions on Multimedia 20, 11 (2018), 2921--2934.Google ScholarDigital Library
- Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. 2015. Visual Tracking with Fully Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV). 3119--3127. Google ScholarDigital Library
- Dongkuan Xu and Yingjie Tian. 2015. A Comprehensive Survey of Clustering Algorithms. Annals of Data Science 2, 2 (8 2015), 165--193. Google ScholarCross Ref
Index Terms
- Deep Feature Compression with Rate-Distortion Optimization for Networked Camera Systems
Recommendations
Efficient CABAC Bit Estimation for H.265/HEVC Rate-Distortion Optimization
The entropy coding of context-adaptive binary arithmetic coding CABAC has been utilized in the H.265/HEVC for higher coding efficiency. But the related complexity also causes a bottleneck for its low-delay applications, owing to the employment of inter-...
SSIM-based error-resilient rate-distortion optimization of H.264/AVC video coding for wireless streaming
The SSIM-based rate-distortion optimization (RDO) has been verified to be an effective tool for H.264/AVC to promote the perceptual video coding performance. However, the current SSIM-based RDO is not efficient for improving the perceptual quality of ...
Rate-Distortion Optimized Progressive Geometry Compression
CGIV '06: Proceedings of the International Conference on Computer Graphics, Imaging and VisualisationDuring progressive transmission of 3D geometry models, the transmission order of details at different region has great effects on the quality of reconstructed models at low bit-rate. This work presents a ratedistortion (R-D) optimized progressive ...
Comments