skip to main content
10.1145/3587819.3590974acmconferencesArticle/Chapter ViewAbstractPublication PagesmmsysConference Proceedingsconference-collections
research-article

Deep Feature Compression with Rate-Distortion Optimization for Networked Camera Systems

Published:08 June 2023Publication History

ABSTRACT

Deep-learning-based video analysis solutions have become indispensable components in today's intelligent sensing applications. In a networked camera system, an efficient way to analyze the captured videos is to extract the features for deep learning at local cameras or edge devices and then transmit the features to powerful processing hubs for further analysis. As there exists substantial redundancy among different feature maps from the same video frame, the feature maps could be compressed before transmission to save bandwidth. This paper introduces a new rate-distortion optimized framework for compressing the intermediate deep features from the key frames of a video. First, to reduce the redundancy among different features, a feature selection strategy is designed based on hierarchical clustering. The selected features are then quantized, repacked as videos, and further compressed using a standardized video encoder. Furthermore, the proposed framework incorporates rate-distortion models that are built for three representative computer vision tasks: image classification, image segmentation, and image retrieval. A corresponding rate-distortion optimization module is designed to enhance the performance of common computer vision tasks under rate constraints. Experimental results show that the proposed deep feature compression framework can boost the compression performance over the standard HEVC video encoder.

References

  1. Zhuo Chen, Kui Fan, Shiqi Wang, Lingyu Duan, Weisi Lin, and Alex Chichung Kot. 2019. Toward Intelligent Sensing: Intermediate Deep Feature Compression. IEEE Transactions on Image Processing 29 (2019), 2230--2243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Zhuo Chen, Kui Fan, Shiqi Wang, Ling-Yu Duan, Weisi Lin, and Alex Kot. 2019. Lossy Intermediate Deep Learning Feature Compression and Evaluation. In Proceedings of the 27th ACM International Conference on Multimedia (Nice, France) (MM '19). Association for Computing Machinery, New York, NY, USA, 2414--2422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Zhuo Chen, Weisi Lin, Shiqi Wang, Lingyu Duan, and Alex C. Kot. 2018. Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing. arXiv:1809.06196 [cs.MM]Google ScholarGoogle Scholar
  4. Hyomin Choi and Ivan V Bajić. 2018. Deep Feature Compression for Collaborative Object Detection. In 2018 25th IEEE International Conference on Image Processing (ICIP). 3743--3747. Google ScholarGoogle ScholarCross RefCross Ref
  5. Hyomin Choi and Ivan V Bajić. 2018. Near-Lossless Deep Feature Compression for Collaborative Intelligence. In 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP). 1--6. Google ScholarGoogle ScholarCross RefCross Ref
  6. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255. Google ScholarGoogle ScholarCross RefCross Ref
  7. Ling-Yu Duan, Yihang Lou, Yan Bai, Tiejun Huang, Wen Gao, Vijay Chandrasekhar, Jie Lin, Shiqi Wang, and Alex Chichung Kot. 2019. Compact Descriptors for Video Analysis: The Emerging MPEG Standard. IEEE MultiMedia 26, 2 (2019), 44--54. Google ScholarGoogle ScholarCross RefCross Ref
  8. Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision 88, 2 (2010), 303--338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Noa Garcia and George Vogiatzis. 2018. Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval. (2018).Google ScholarGoogle Scholar
  10. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2016. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 1 (2016), 142--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jiuxiang Gu, Jianfei Cai, Gang Wang, and Tsuhan Chen. 2018. Stack-Captioning: Coarse-to-Fine Learning for Image Captioning. arXiv:1709.03376 [cs.CV]Google ScholarGoogle Scholar
  12. Sulaiman M.N Hossin M. 2015. A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining and Knowledge Management Process 5, 2 (2015), 01--11. Google ScholarGoogle ScholarCross RefCross Ref
  13. Ademola Ikusan and Rui Dai. 2021. Rate-Distortion Optimized Hierarchical Deep Feature Compression. In 2021 IEEE International Conference on Multimedia and Expo (ICME). 1--6. Google ScholarGoogle ScholarCross RefCross Ref
  14. Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. In Computer Vision - ECCV 2008, David Forsyth, Philip Torr, and Andrew Zisserman (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 304--317.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Xuan Jing, Lap-Pui Chau, and Wan-Chi Siu. 2008. Frame Complexity-Based Rate-Quantization Model for H.264/AVC Intraframe Rate Control. IEEE Signal Processing Letters 15 (2008), 373--376. Google ScholarGoogle ScholarCross RefCross Ref
  16. Lingchao Kong and Rui Dai. 2016. Temporal-fluctuation-reduced video encoding for object detection in wireless surveillance systems. In 2016 IEEE International Symposium on Multimedia (ISM). IEEE, 126--132.Google ScholarGoogle ScholarCross RefCross Ref
  17. Lingchao Kong and Rui Dai. 2017. Object-detection-based video compression for wireless surveillance systems. IEEE MultiMedia 24, 2 (2017), 76--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lingchao Kong and Rui Dai. 2018. Efficient Video Encoding for Automatic Video Analysis in Distributed Wireless Surveillance Systems. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 3 (2018), 72.Google ScholarGoogle Scholar
  19. Jie Lin, Ling-Yu Duan, Shiqi Wang, Yan Bai, Yihang Lou, Vijay Chandrasekhar, Tiejun Huang, Alex Kot, and Wen Gao. 2017. HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval. IEEE Transactions on Multimedia 19, 9 (2017), 1968--1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2017. Hierarchical Question-Image Co-Attention for Visual Question Answering. arXiv:1606.00061 [cs.CV]Google ScholarGoogle Scholar
  21. Mark Sanderson. 2010. Performance Measures Used in Image Information Retrieval. In ImageCLEF, Experimental Evaluation in Visual Information Retrieval, Henning Müller, Paul D. Clough, Thomas Deselaers, and Barbara Caputo (Eds.). Springer, 81--94. Google ScholarGoogle ScholarCross RefCross Ref
  22. Sinan Saracli, Nurhan Dogan, and Ismet Dogan. 2013. Comparison of hierarchical cluster analysis methods by cophenetic correlation. Journal of Inequalities and Applications 29 (2013), 9. Google ScholarGoogle ScholarCross RefCross Ref
  23. Evan Shelhamer, Jonathan Long, and Trevor Darrell. 2016. Fully Convolutional Networks for Semantic Segmentation. arXiv:1605.06211 [cs.CV]Google ScholarGoogle Scholar
  24. Ran Shi, King Ngi Ngan, and Songnan Li. 2014. Jaccard index compensation for object segmentation evaluation. 2014 IEEE International Conference on Image Processing (ICIP) (2014), 4457--4461.Google ScholarGoogle ScholarCross RefCross Ref
  25. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]Google ScholarGoogle Scholar
  26. Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649--1668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2016. Particular object retrieval with integral max-pooling of CNN activations. (2016).Google ScholarGoogle Scholar
  28. Tsung-Han Tsai and Chung-Yuan Lin. 2011. Exploring contextual redundancy in improving object-based video coding for video sensor networks surveillance. IEEE Transactions on Multimedia 14, 3 (2011), 669--682.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Gang Wang, Bo Li, Yongfei Zhang, and Jinhui Yang. 2018. Background modeling and referencing for moving cameras-captured surveillance video coding in HEVC. IEEE Transactions on Multimedia 20, 11 (2018), 2921--2934.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. 2015. Visual Tracking with Fully Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV). 3119--3127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Dongkuan Xu and Yingjie Tian. 2015. A Comprehensive Survey of Clustering Algorithms. Annals of Data Science 2, 2 (8 2015), 165--193. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Deep Feature Compression with Rate-Distortion Optimization for Networked Camera Systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MMSys '23: Proceedings of the 14th ACM Multimedia Systems Conference
          June 2023
          495 pages
          ISBN:9798400701481
          DOI:10.1145/3587819

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 June 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate176of530submissions,33%
        • Article Metrics

          • Downloads (Last 12 months)91
          • Downloads (Last 6 weeks)4

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader