research-article

Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach

Authors:
Zhaoliang He

Tsinghua University and Peng Cheng Laboratory, Shenzhen, China

Tsinghua University and Peng Cheng Laboratory, Shenzhen, China
View Profile

,
Hongshan Li

Tsinghua University, Shenzhen, China

Tsinghua University, Shenzhen, China
View Profile

,
Zhi Wang

Tsinghua University and Peng Cheng Laboratory, Shenzhen, China

Tsinghua University and Peng Cheng Laboratory, Shenzhen, China
View Profile

,
Shutao Xia

Tsinghua University, Shenzhen, China

Tsinghua University, Shenzhen, China
View Profile

,
Wenwu Zhu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17 Issue 4Article No.: 118pp 1–23https://doi.org/10.1145/3447878

Published:12 November 2021Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

With the growth of computer vision-based applications, an explosive amount of images have been uploaded to cloud servers that host such online computer vision algorithms, usually in the form of deep learning models. JPEG has been used as the de facto compression and encapsulation method for images. However, standard JPEG configuration does not always perform well for compressing images that are to be processed by a deep learning model—for example, the standard quality level of JPEG leads to 50% of size overhead (compared with the best quality level selection) on ImageNet under the same inference accuracy in popular computer vision models (e.g., InceptionNet and ResNet). Knowing this, designing a better JPEG configuration for online computer vision-based services is still extremely challenging. First, cloud-based computer vision models are usually a black box to end-users; thus, it is challenging to design JPEG configuration without knowing their model structures. Second, the “optimal” JPEG configuration is not fixed; instead, it is determined by confounding factors, including the characteristics of the input images and the model, the expected accuracy and image size, and so forth. In this article, we propose a reinforcement learning (RL)-based adaptive JPEG configuration framework, AdaCompress. In particular, we design an edge (i.e., user-side) RL agent that learns the optimal compression quality level to achieve an expected inference accuracy and upload image size, only from the online inference results, without knowing details of the model structures. Furthermore, we design an explore-exploit mechanism to let the framework fast switch an agent when it detects a performance degradation, mainly due to the input change (e.g., images captured across daytime and night). Our evaluation experiments using real-world online computer vision-based APIs from Amazon Rekognition, Face++, and Baidu Vision show that our approach outperforms existing baselines by reducing the size of images by one-half to one-third while the overall classification accuracy only decreases slightly. Meanwhile, AdaCompress adaptively re-trains or re-loads the RL agent promptly to maintain the performance.

REFERENCES

[1] Agrawal Harsh, Mathialagan Clint Solomon, Goyal Yash, Chavali Neelima, Banik Prakriti, Mohapatra Akrit, Osman Ahmed, and Batra Dhruv. 2015. CloudCV: Large-scale distributed computer vision as a cloud service. In Mobile Cloud Visual Media Computing. Springer, 265–290.Google ScholarCross Ref
[2] Amazon. 2019. Amazon Rekognition. Retrieved September 15, 2021 from https://aws.amazon.com/rekognition/.Google Scholar
[3] Anwar Sajid, Hwang Kyuyeon, and Sung Wonyong. 2015. Fixed point optimization of deep convolutional neural networks for object recognition. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). IEEE, Los Alamitos, CA, 1131–1135.Google ScholarCross Ref
[4] Anwar Sajid, Hwang Kyuyeon, and Sung Wonyong. 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 32. Google ScholarDigital Library
[5] Baidu. 2019. Baidu AI Open Platform. Retrieved September 15, 2021 from https://ai.baidu.com/.Google Scholar
[6] Ballé Johannes, Minnen David, Singh Saurabh, Hwang Sung Jin, and Johnston Nick. 2018. Variational image compression with a scale hyperprior. arXiv:1802.01436.Google Scholar
[7] Baluja Shumeet, Marwood David, and Johnston Nicholas. 2019. Task-specific color spaces and compression for machine-based object recognition. Technical Disclosure Commons, March 21, 2019.Google Scholar
[8] Calore M.. 2010. Meet WebP, Google’s New Image Format. Wired, October 1, 2010.Google Scholar
[9] Chamain Lahiru D., Cheung Sen-Ching Samson, and Ding Zhi. 2019. Quannet: Joint image compression and classification over channels with limited bandwidth. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME’19). IEEE, Los Alamitos, CA, 338–343.Google ScholarCross Ref
[10] Chao Jianshu, Chen Hu, and Steinbach Eckehard. 2013. On the design of a novel JPEG quantization table for improved feature detection performance. In Proceedings of the 2013 IEEE International Conference on Image Processing. IEEE, Los Alamitos, CA, 1675–1679.Google ScholarCross Ref
[11] Chao Jianshu and Steinbach Eckehard. 2011. Preserving SIFT features in JPEG-encoded images. In Proceedings of the 2011 18th IEEE International Conference on Image Processing. IEEE, Los Alamitos, CA, 301–304.Google ScholarCross Ref
[12] Cheng Zhengxue, Sun Heming, Takeuchi Masaru, and Katto Jiro. 2020. Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7939–7948.Google ScholarCross Ref
[13] Delac Kresimir, Grgic Mislav, and Grgic Sonja. 2005. Effects of JPEG and JPEG2000 compression on face recognition. In Proceedings of the International Conference on Pattern Recognition and Image Analysis. 136–145. Google ScholarDigital Library
[14] DIS ISO. 1991. 10918-1. Digital compression and coding of continuous-tone still images (JPEG). CCITT Recommendation T 81 (1991), 6.Google Scholar
[15] Dodge Samuel and Karam Lina. 2016. Understanding how image quality affects deep neural networks. In Proceedings of the 2016 8th International Conference on Quality of Multimedia Experience (QoMEX’16). IEEE, Los Alamitos, CA, 1–6.Google ScholarCross Ref
[16] Eshratifar Amir Erfan and Pedram Massoud. 2018. Energy and performance efficient computation offloading for deep neural networks in a mobile cloud computing environment. In Proceedings of the 2018 Great Lakes Symposium on VLSI. ACM, New York, NY, 111–116. Google ScholarDigital Library
[17] Evtimov Ivan, Eykholt Kevin, Fernandes Earlence, Kohno Tadayoshi, Li Bo, Prakash Atul, Rahmati Amir, and Song Dawn. 2018. Robust physical-world attacks on deep learning models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
[18] Face++. 2019. Face++ Cognitive Services. Retrieved September 15, 2021 from https://www.faceplusplus.com/.Google Scholar
[19] FLIR. 2018. FLIR Thermal Dataset. Retrieved September 15, 2021 from https://www.flir.com/oem/adas/adas-dataset-form/.Google Scholar
[20] Ge Weifeng and Yu Yizhou. 2017. Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1086–1095.Google ScholarCross Ref
[21] Gong Yunchao, Liu Liu, Yang Ming, and Bourdev Lubomir. 2014. Compressing deep convolutional networks using vector quantization. arXiv:1412.6115.Google Scholar
[22] Inc. Google2019. Google Edge TPU. Retrieved September 15, 2021 from https://cloud.google.com/edge-tpu/.Google Scholar
[23] Gueguen Lionel, Sergeev Alex, Kadlec Ben, Liu Rosanne, and Yosinski Jason. 2018. Faster neural networks straight from JPEG. In Advances in Neural Information Processing Systems. 3933–3944. Google ScholarDigital Library
[24] Han Song, Mao Huizi, and Dally William J.. 2015. A deep neural network compression pipeline: Pruning, quantization, Huffman encoding. arXiv:1510.00149.Google Scholar
[25] Han Song, Pool Jeff, Tran John, and Dally William. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. 1135–1143. Google ScholarDigital Library
[26] Han Seungyeop, Shen Haichen, Philipose Matthai, Agarwal Sharad, Wolman Alec, and Krishnamurthy Arvind. 2016. MCDNN: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services. ACM, New York, NY, 123–136. Google ScholarDigital Library
[27] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
[28] Hu Pengfei, Ning Huansheng, Qiu Tie, Zhang Yanfei, and Luo Xiong. 2017. Fog computing-based face identification and resolution scheme in Internet of Things. IEEE Transactions on Industrial Informatics 13, 4 (2017), 1910–1920.Google ScholarCross Ref
[29] Hu Yun Chao, Patel Milan, Sabella Dario, Sprecher Nurit, and Young Valerie. 2015. Mobile Edge Computing—A Key Technology Towards 5G. White Paper No. 11. ETSI.Google Scholar
[30] Huawei. 2019. Huawei Atlas 500 Edge Station. Retrieved September 15, 2021 from https://e.huawei.com/en/products/cloud-computing-dc/servers/g-series/atlas-500.Google Scholar
[31] Huynh Loc N., Lee Youngki, and Balan Rajesh Krishna. 2017. DeepMon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, New York, NY, 82–95. Google ScholarDigital Library
[32] Hwang Kyuyeon and Sung Wonyong. 2014. Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1. In Proceedings of the 2014 IEEE Workshop on Signal Processing Systems (SiPS’14). IEEE, Los Alamitos, CA, 1–6.Google ScholarCross Ref
[33] Kang Y., Hauswald J., Gao C., Rovinski A., Mudge T., Mars J., and Tang L.. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). ACM, New York, NY, 615–629. Google ScholarDigital Library
[34] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105. Google ScholarDigital Library
[35] Lee Jooyoung, Cho Seunghyun, and Beack Seung-Kwon. 2018. Context-adaptive entropy model for end-to-end optimized image compression. arXiv:1809.10452.Google Scholar
[36] Li Hongshan, Guo Yu, Wang Zhi, Xia Shutao, and Zhu Wenwu. 2019. AdaCompress: Adaptive compression for online computer vision services. In Proceedings of the 27th ACM International Conference on Multimedia. 2440–2448. Google ScholarDigital Library
[37] Li Hongshan, Hu Chenghao, Jiang Jingyan, Wang Zhi, Wen Yonggang, and Zhu Wenwu. 2018. JALAD: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems (ICPADS’18). 671–678. https://doi.org/10.1109/PADSW.2018.8645013Google ScholarCross Ref
[38] Library Python Imaging. 2019. Image File Formats. Retrieved September 15, 2021 from https://pillow.readthedocs.io/en/stable/.Google Scholar
[39] Liu Zihao, Liu Tao, Wen Wujie, Jiang Lei, Xu Jie, Wang Yanzhi, and Quan Gang. 2018. DeepN-JPEG: A deep neural network favorable JPEG-based image compression framework. In Proceedings of the 55th Annual Design Automation Conference. ACM, New York, NY, 18. Google ScholarDigital Library
[40] Minnen David, Ballé Johannes, and Toderici George D.. 2018. Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems. 10771–10780. Google ScholarDigital Library
[41] Mnih Volodymyr, Kavukcuoglu Koray, Silver David, Graves Alex, Antonoglou Ioannis, Wierstra Daan, and Riedmiller Martin. 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602.Google Scholar
[42] Ohm Jens-Rainer and Sullivan Gary J.. 2018. Versatile video coding–towards the next generation of video compression. In Proceedings of the Picture Coding Symposium, Vol. 2018.Google Scholar
[43] Image-Net.org. 2012. ImageNet Large Scale Visual Recognition Challenge 2012. Retrieved September 15, 2021 from http://image-net.org/challenges/LSVRC/2012/.Google Scholar
[44] Pi Raspberry. 2019. Raspberry Pi 4 Model B. Retrieved September 15, 2021 from https://www.raspberrypi.org/products/raspberry-pi-4-model-b/.Google Scholar
[45] Rabbani Majid. 2002. JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging 11, 2 (2002), 286.Google ScholarCross Ref
[46] Radford Alec, Narasimhan Karthik, Salimans Tim, and Sutskever Ilya. 2018. Improving Language Understanding by Generative Pre-training. Retrieved September 15, 2021 from https://www.cs.ubc.ca/ amuham01/LING530/papers/radford2018improving.pdf.Google Scholar
[47] Flynn R.. 2019. Lossy Image Optimization. Retrieved September 15, 2021 from https://github.com/rflynn/imgmin.Google Scholar
[48] Rippel Oren and Bourdev Lubomir. 2017. Real-time adaptive image compression. In Proceedings of the 34th International Conference on Machine Learning—Volume 70. 2922–2930. Google ScholarDigital Library
[49] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y Google ScholarDigital Library
[50] Sadeghi Alireza, Wang Gang, and Giannakis Georgios B.. 2019. Deep reinforcement learning for adaptive caching in hierarchical content delivery networks. IEEE Transactions on Cognitive Communications and Networking 5, 4 (2019), 1024–1033.Google ScholarCross Ref
[51] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.Google ScholarCross Ref
[52] Satyanarayanan Mahadev. 2017. The emergence of edge computing. Computer 50, 1 (2017), 30–39. Google ScholarDigital Library
[53] Selvaraju Ramprasaath R., Cogswell Michael, Das Abhishek, Vedantam Ramakrishna, Parikh Devi, and Batra Dhruv. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.Google ScholarCross Ref
[54] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google Scholar
[55] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google Scholar
[56] SpeedTest. 2019. Speedtest Global Index. Retrieved September 15, 2021 from https://www.speedtest.net/global-index.Google Scholar
[57] Sullivan Gary J., Ohm Jens-Rainer, Han Woo-Jin, and Wiegand Thomas. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668. Google ScholarDigital Library
[58] Sutton Richard S. and Barto Andrew G.. 2018. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA. Google ScholarDigital Library
[59] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarCross Ref
[60] Theis Lucas, Shi Wenzhe, Cunningham Andrew, and Huszár Ferenc. 2017. Lossy image compression with compressive autoencoders. arXiv:1703.00395.Google Scholar
[61] Toderici George, O’Malley Sean M., Hwang Sung Jin, Vincent Damien, Minnen David, Baluja Shumeet, Covell Michele, and Sukthankar Rahul. 2015. Variable rate image compression with recurrent neural networks. arXiv:1511.06085.Google Scholar
[62] Toderici George, Vincent Damien, Johnston Nick, Hwang Sung Jin, Minnen David, Shor Joel, and Covell Michele. 2017. Full resolution image compression with recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5306–5314.Google ScholarCross Ref
[63] Torfason Robert, Mentzer Fabian, Agustsson Eirikur, Tschannen Michael, Timofte Radu, and Gool Luc Van. 2018. Towards image understanding from deep compression without decoding. arXiv:1803.06131.Google Scholar
[64] Wallace Gregory K.. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv. Google ScholarDigital Library
[65] Wang Fangxin, Wang Feng, Liu Jiangchuan, Shea Ryan, and Sun Lifeng. 2020. Intelligent video caching at network edge: A multi-agent deep reinforcement learning approach. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM’20). IEEE, Los Alamitos, CA, 2499–2508.Google ScholarDigital Library
[66] Yang Zhaohui, Wang Yunhe, Xu Chang, Du Peng, Xu Chao, Xu Chunjing, and Tian Qi. 2020. Discernible image compression. In Proceedings of the 28th ACM International Conference on Multimedia. 1561–1569. Google ScholarDigital Library
[67] Yuan Xiaoyong, He Pan, Zhu Qile, and Li Xiaolin. 2019. Adversarial examples: Attacks and defenses for deep learning. IEEE Transactions on Neural Networks and Learning Systems 30, 9 (2019), 2805–2824.Google ScholarCross Ref
[68] Zhong Chen, Gursoy M. Cenk, and Velipasalar Senem. 2018. A deep reinforcement learning-based framework for content caching. In Proceedings of the 2018 52nd Annual Conference on Information Sciences and Systems (CISS’18). IEEE, Los Alamitos, CA, 1–6.Google ScholarCross Ref

Index Terms

Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach
1. Computer systems organization
  1. Real-time systems
2. Networks
  1. Network components

Recommendations

Deep reinforcement learning in computer vision: a comprehensive survey
Abstract
Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains ...
Read More
Reinforcement Learning for Computer Vision and Robot Navigation
Machine Learning and Data Mining in Pattern Recognition
Abstract
Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as ...
Read More
Entropy Adaptive On-Line Compression
NCA '14: Proceedings of the 2014 IEEE 13th International Symposium on Network Computing and Applications

Self-Organization is based on adaptivity. Adaptivity should start with the very basic fundamental communication tasks such as encoding the information to be transmitted or stored. Obviously, the less signal transmitted the less energy in transmission ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 4
November 2021
529 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3492437
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2021
- Revised: 1 January 2021
- Accepted: 1 January 2021
- Received: 1 April 2020
Published in tomm Volume 17, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Edge computing
reinforcement learning
adaptive compression
machine learning service
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 691
  Total Downloads
- Downloads (Last 12 months)138
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Deep reinforcement learning in computer vision: a comprehensive survey

Reinforcement Learning for Computer Vision and Robot Navigation

Entropy Adaptive On-Line Compression