Abstract
With the growth of computer vision-based applications, an explosive amount of images have been uploaded to cloud servers that host such online computer vision algorithms, usually in the form of deep learning models. JPEG has been used as the de facto compression and encapsulation method for images. However, standard JPEG configuration does not always perform well for compressing images that are to be processed by a deep learning model—for example, the standard quality level of JPEG leads to 50% of size overhead (compared with the best quality level selection) on ImageNet under the same inference accuracy in popular computer vision models (e.g., InceptionNet and ResNet). Knowing this, designing a better JPEG configuration for online computer vision-based services is still extremely challenging. First, cloud-based computer vision models are usually a black box to end-users; thus, it is challenging to design JPEG configuration without knowing their model structures. Second, the “optimal” JPEG configuration is not fixed; instead, it is determined by confounding factors, including the characteristics of the input images and the model, the expected accuracy and image size, and so forth. In this article, we propose a reinforcement learning (RL)-based adaptive JPEG configuration framework, AdaCompress. In particular, we design an edge (i.e., user-side) RL agent that learns the optimal compression quality level to achieve an expected inference accuracy and upload image size, only from the online inference results, without knowing details of the model structures. Furthermore, we design an explore-exploit mechanism to let the framework fast switch an agent when it detects a performance degradation, mainly due to the input change (e.g., images captured across daytime and night). Our evaluation experiments using real-world online computer vision-based APIs from Amazon Rekognition, Face++, and Baidu Vision show that our approach outperforms existing baselines by reducing the size of images by one-half to one-third while the overall classification accuracy only decreases slightly. Meanwhile, AdaCompress adaptively re-trains or re-loads the RL agent promptly to maintain the performance.
- [1] . 2015. CloudCV: Large-scale distributed computer vision as a cloud service. In Mobile Cloud Visual Media Computing. Springer, 265–290.Google ScholarCross Ref
- [2] . 2019. Amazon Rekognition. Retrieved September 15, 2021 from https://aws.amazon.com/rekognition/.Google Scholar
- [3] . 2015. Fixed point optimization of deep convolutional neural networks for object recognition. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15).
IEEE ,Los Alamitos, CA , 1131–1135.Google ScholarCross Ref - [4] . 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 32. Google ScholarDigital Library
- [5] . 2019. Baidu AI Open Platform. Retrieved September 15, 2021 from https://ai.baidu.com/.Google Scholar
- [6] . 2018. Variational image compression with a scale hyperprior. arXiv:1802.01436.Google Scholar
- [7] . 2019. Task-specific color spaces and compression for machine-based object recognition. Technical Disclosure Commons, March 21, 2019.Google Scholar
- [8] . 2010. Meet WebP, Google’s New Image Format. Wired, October 1, 2010.Google Scholar
- [9] . 2019. Quannet: Joint image compression and classification over channels with limited bandwidth. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME’19).
IEEE ,Los Alamitos, CA , 338–343.Google ScholarCross Ref - [10] . 2013. On the design of a novel JPEG quantization table for improved feature detection performance. In Proceedings of the 2013 IEEE International Conference on Image Processing.
IEEE ,Los Alamitos, CA , 1675–1679.Google ScholarCross Ref - [11] . 2011. Preserving SIFT features in JPEG-encoded images. In Proceedings of the 2011 18th IEEE International Conference on Image Processing.
IEEE , Los Alamitos, CA, 301–304.Google ScholarCross Ref - [12] . 2020. Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7939–7948.Google ScholarCross Ref
- [13] . 2005. Effects of JPEG and JPEG2000 compression on face recognition. In Proceedings of the International Conference on Pattern Recognition and Image Analysis. 136–145. Google ScholarDigital Library
- [14] . 1991. 10918-1. Digital compression and coding of continuous-tone still images (JPEG). CCITT Recommendation T 81 (1991), 6.Google Scholar
- [15] . 2016. Understanding how image quality affects deep neural networks. In Proceedings of the 2016 8th International Conference on Quality of Multimedia Experience (QoMEX’16).
IEEE ,Los Alamitos, CA , 1–6.Google ScholarCross Ref - [16] . 2018. Energy and performance efficient computation offloading for deep neural networks in a mobile cloud computing environment. In Proceedings of the 2018 Great Lakes Symposium on VLSI.
ACM ,New York, NY , 111–116. Google ScholarDigital Library - [17] . 2018. Robust physical-world attacks on deep learning models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- [18] . 2019. Face++ Cognitive Services. Retrieved September 15, 2021 from https://www.faceplusplus.com/.Google Scholar
- [19] . 2018. FLIR Thermal Dataset. Retrieved September 15, 2021 from https://www.flir.com/oem/adas/adas-dataset-form/.Google Scholar
- [20] . 2017. Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1086–1095.Google ScholarCross Ref
- [21] . 2014. Compressing deep convolutional networks using vector quantization. arXiv:1412.6115.Google Scholar
- [22] 2019. Google Edge TPU. Retrieved September 15, 2021 from https://cloud.google.com/edge-tpu/.Google Scholar
- [23] . 2018. Faster neural networks straight from JPEG. In Advances in Neural Information Processing Systems. 3933–3944. Google ScholarDigital Library
- [24] . 2015. A deep neural network compression pipeline: Pruning, quantization, Huffman encoding. arXiv:1510.00149.Google Scholar
- [25] . 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. 1135–1143. Google ScholarDigital Library
- [26] . 2016. MCDNN: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services.
ACM ,New York, NY , 123–136. Google ScholarDigital Library - [27] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
- [28] . 2017. Fog computing-based face identification and resolution scheme in Internet of Things. IEEE Transactions on Industrial Informatics 13, 4 (2017), 1910–1920.Google ScholarCross Ref
- [29] . 2015. Mobile Edge Computing—A Key Technology Towards 5G. White Paper No. 11. ETSI.Google Scholar
- [30] . 2019. Huawei Atlas 500 Edge Station. Retrieved September 15, 2021 from https://e.huawei.com/en/products/cloud-computing-dc/servers/g-series/atlas-500.Google Scholar
- [31] . 2017. DeepMon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services.
ACM ,New York, NY , 82–95. Google ScholarDigital Library - [32] . 2014. Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1. In Proceedings of the 2014 IEEE Workshop on Signal Processing Systems (SiPS’14).
IEEE ,Los Alamitos, CA , 1–6.Google ScholarCross Ref - [33] . 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). ACM, New York, NY, 615–629. Google ScholarDigital Library
- [34] . 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105. Google ScholarDigital Library
- [35] . 2018. Context-adaptive entropy model for end-to-end optimized image compression. arXiv:1809.10452.Google Scholar
- [36] . 2019. AdaCompress: Adaptive compression for online computer vision services. In Proceedings of the 27th ACM International Conference on Multimedia. 2440–2448. Google ScholarDigital Library
- [37] . 2018. JALAD: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems (ICPADS’18). 671–678. https://doi.org/10.1109/PADSW.2018.8645013Google ScholarCross Ref
- [38] . 2019. Image File Formats. Retrieved September 15, 2021 from https://pillow.readthedocs.io/en/stable/.Google Scholar
- [39] . 2018. DeepN-JPEG: A deep neural network favorable JPEG-based image compression framework. In Proceedings of the 55th Annual Design Automation Conference.
ACM ,New York, NY , 18. Google ScholarDigital Library - [40] . 2018. Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems. 10771–10780. Google ScholarDigital Library
- [41] . 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602.Google Scholar
- [42] . 2018. Versatile video coding–towards the next generation of video compression. In Proceedings of the Picture Coding Symposium, Vol. 2018.Google Scholar
- [43] . 2012. ImageNet Large Scale Visual Recognition Challenge 2012. Retrieved September 15, 2021 from http://image-net.org/challenges/LSVRC/2012/.Google Scholar
- [44] . 2019. Raspberry Pi 4 Model B. Retrieved September 15, 2021 from https://www.raspberrypi.org/products/raspberry-pi-4-model-b/.Google Scholar
- [45] . 2002. JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging 11, 2 (2002), 286.Google ScholarCross Ref
- [46] . 2018. Improving Language Understanding by Generative Pre-training. Retrieved September 15, 2021 from https://www.cs.ubc.ca/ amuham01/LING530/papers/radford2018improving.pdf.Google Scholar
- [47] . 2019. Lossy Image Optimization. Retrieved September 15, 2021 from https://github.com/rflynn/imgmin.Google Scholar
- [48] . 2017. Real-time adaptive image compression. In Proceedings of the 34th International Conference on Machine Learning—Volume 70. 2922–2930. Google ScholarDigital Library
- [49] . 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y Google ScholarDigital Library
- [50] . 2019. Deep reinforcement learning for adaptive caching in hierarchical content delivery networks. IEEE Transactions on Cognitive Communications and Networking 5, 4 (2019), 1024–1033.Google ScholarCross Ref
- [51] . 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.Google ScholarCross Ref
- [52] . 2017. The emergence of edge computing. Computer 50, 1 (2017), 30–39. Google ScholarDigital Library
- [53] . 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.Google ScholarCross Ref
- [54] . 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google Scholar
- [55] . 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google Scholar
- [56] . 2019. Speedtest Global Index. Retrieved September 15, 2021 from https://www.speedtest.net/global-index.Google Scholar
- [57] . 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668. Google ScholarDigital Library
- [58] . 2018. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA. Google ScholarDigital Library
- [59] . 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarCross Ref
- [60] . 2017. Lossy image compression with compressive autoencoders. arXiv:1703.00395.Google Scholar
- [61] . 2015. Variable rate image compression with recurrent neural networks. arXiv:1511.06085.Google Scholar
- [62] . 2017. Full resolution image compression with recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5306–5314.Google ScholarCross Ref
- [63] . 2018. Towards image understanding from deep compression without decoding. arXiv:1803.06131.Google Scholar
- [64] . 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv. Google ScholarDigital Library
- [65] . 2020. Intelligent video caching at network edge: A multi-agent deep reinforcement learning approach. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM’20).
IEEE ,Los Alamitos, CA , 2499–2508.Google ScholarDigital Library - [66] . 2020. Discernible image compression. In Proceedings of the 28th ACM International Conference on Multimedia. 1561–1569. Google ScholarDigital Library
- [67] . 2019. Adversarial examples: Attacks and defenses for deep learning. IEEE Transactions on Neural Networks and Learning Systems 30, 9 (2019), 2805–2824.Google ScholarCross Ref
- [68] . 2018. A deep reinforcement learning-based framework for content caching. In Proceedings of the 2018 52nd Annual Conference on Information Sciences and Systems (CISS’18).
IEEE ,Los Alamitos, CA , 1–6.Google ScholarCross Ref
Index Terms
- Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach
Recommendations
Deep reinforcement learning in computer vision: a comprehensive survey
AbstractDeep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains ...
Reinforcement Learning for Computer Vision and Robot Navigation
Machine Learning and Data Mining in Pattern RecognitionAbstractNowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as ...
Entropy Adaptive On-Line Compression
NCA '14: Proceedings of the 2014 IEEE 13th International Symposium on Network Computing and ApplicationsSelf-Organization is based on adaptivity. Adaptivity should start with the very basic fundamental communication tasks such as encoding the information to be transmitted or stored. Obviously, the less signal transmitted the less energy in transmission ...
Comments