skip to main content
research-article

Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach

Authors Info & Claims
Published:12 November 2021Publication History
Skip Abstract Section

Abstract

With the growth of computer vision-based applications, an explosive amount of images have been uploaded to cloud servers that host such online computer vision algorithms, usually in the form of deep learning models. JPEG has been used as the de facto compression and encapsulation method for images. However, standard JPEG configuration does not always perform well for compressing images that are to be processed by a deep learning model—for example, the standard quality level of JPEG leads to 50% of size overhead (compared with the best quality level selection) on ImageNet under the same inference accuracy in popular computer vision models (e.g., InceptionNet and ResNet). Knowing this, designing a better JPEG configuration for online computer vision-based services is still extremely challenging. First, cloud-based computer vision models are usually a black box to end-users; thus, it is challenging to design JPEG configuration without knowing their model structures. Second, the “optimal” JPEG configuration is not fixed; instead, it is determined by confounding factors, including the characteristics of the input images and the model, the expected accuracy and image size, and so forth. In this article, we propose a reinforcement learning (RL)-based adaptive JPEG configuration framework, AdaCompress. In particular, we design an edge (i.e., user-side) RL agent that learns the optimal compression quality level to achieve an expected inference accuracy and upload image size, only from the online inference results, without knowing details of the model structures. Furthermore, we design an explore-exploit mechanism to let the framework fast switch an agent when it detects a performance degradation, mainly due to the input change (e.g., images captured across daytime and night). Our evaluation experiments using real-world online computer vision-based APIs from Amazon Rekognition, Face++, and Baidu Vision show that our approach outperforms existing baselines by reducing the size of images by one-half to one-third while the overall classification accuracy only decreases slightly. Meanwhile, AdaCompress adaptively re-trains or re-loads the RL agent promptly to maintain the performance.

REFERENCES

  1. [1] Agrawal Harsh, Mathialagan Clint Solomon, Goyal Yash, Chavali Neelima, Banik Prakriti, Mohapatra Akrit, Osman Ahmed, and Batra Dhruv. 2015. CloudCV: Large-scale distributed computer vision as a cloud service. In Mobile Cloud Visual Media Computing. Springer, 265290.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Amazon. 2019. Amazon Rekognition. Retrieved September 15, 2021 from https://aws.amazon.com/rekognition/.Google ScholarGoogle Scholar
  3. [3] Anwar Sajid, Hwang Kyuyeon, and Sung Wonyong. 2015. Fixed point optimization of deep convolutional neural networks for object recognition. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). IEEE, Los Alamitos, CA, 11311135.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Anwar Sajid, Hwang Kyuyeon, and Sung Wonyong. 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Baidu. 2019. Baidu AI Open Platform. Retrieved September 15, 2021 from https://ai.baidu.com/.Google ScholarGoogle Scholar
  6. [6] Ballé Johannes, Minnen David, Singh Saurabh, Hwang Sung Jin, and Johnston Nick. 2018. Variational image compression with a scale hyperprior. arXiv:1802.01436.Google ScholarGoogle Scholar
  7. [7] Baluja Shumeet, Marwood David, and Johnston Nicholas. 2019. Task-specific color spaces and compression for machine-based object recognition. Technical Disclosure Commons, March 21, 2019.Google ScholarGoogle Scholar
  8. [8] Calore M.. 2010. Meet WebP, Google’s New Image Format. Wired, October 1, 2010.Google ScholarGoogle Scholar
  9. [9] Chamain Lahiru D., Cheung Sen-Ching Samson, and Ding Zhi. 2019. Quannet: Joint image compression and classification over channels with limited bandwidth. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME’19). IEEE, Los Alamitos, CA, 338343.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Chao Jianshu, Chen Hu, and Steinbach Eckehard. 2013. On the design of a novel JPEG quantization table for improved feature detection performance. In Proceedings of the 2013 IEEE International Conference on Image Processing. IEEE, Los Alamitos, CA, 16751679.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Chao Jianshu and Steinbach Eckehard. 2011. Preserving SIFT features in JPEG-encoded images. In Proceedings of the 2011 18th IEEE International Conference on Image Processing. IEEE, Los Alamitos, CA, 301304.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Cheng Zhengxue, Sun Heming, Takeuchi Masaru, and Katto Jiro. 2020. Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 79397948.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Delac Kresimir, Grgic Mislav, and Grgic Sonja. 2005. Effects of JPEG and JPEG2000 compression on face recognition. In Proceedings of the International Conference on Pattern Recognition and Image Analysis. 136145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] DIS ISO. 1991. 10918-1. Digital compression and coding of continuous-tone still images (JPEG). CCITT Recommendation T 81 (1991), 6.Google ScholarGoogle Scholar
  15. [15] Dodge Samuel and Karam Lina. 2016. Understanding how image quality affects deep neural networks. In Proceedings of the 2016 8th International Conference on Quality of Multimedia Experience (QoMEX’16). IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Eshratifar Amir Erfan and Pedram Massoud. 2018. Energy and performance efficient computation offloading for deep neural networks in a mobile cloud computing environment. In Proceedings of the 2018 Great Lakes Symposium on VLSI. ACM, New York, NY, 111116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Evtimov Ivan, Eykholt Kevin, Fernandes Earlence, Kohno Tadayoshi, Li Bo, Prakash Atul, Rahmati Amir, and Song Dawn. 2018. Robust physical-world attacks on deep learning models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  18. [18] Face++. 2019. Face++ Cognitive Services. Retrieved September 15, 2021 from https://www.faceplusplus.com/.Google ScholarGoogle Scholar
  19. [19] FLIR. 2018. FLIR Thermal Dataset. Retrieved September 15, 2021 from https://www.flir.com/oem/adas/adas-dataset-form/.Google ScholarGoogle Scholar
  20. [20] Ge Weifeng and Yu Yizhou. 2017. Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10861095.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Gong Yunchao, Liu Liu, Yang Ming, and Bourdev Lubomir. 2014. Compressing deep convolutional networks using vector quantization. arXiv:1412.6115.Google ScholarGoogle Scholar
  22. [22] Inc. Google2019. Google Edge TPU. Retrieved September 15, 2021 from https://cloud.google.com/edge-tpu/.Google ScholarGoogle Scholar
  23. [23] Gueguen Lionel, Sergeev Alex, Kadlec Ben, Liu Rosanne, and Yosinski Jason. 2018. Faster neural networks straight from JPEG. In Advances in Neural Information Processing Systems. 39333944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Han Song, Mao Huizi, and Dally William J.. 2015. A deep neural network compression pipeline: Pruning, quantization, Huffman encoding. arXiv:1510.00149.Google ScholarGoogle Scholar
  25. [25] Han Song, Pool Jeff, Tran John, and Dally William. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. 11351143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Han Seungyeop, Shen Haichen, Philipose Matthai, Agarwal Sharad, Wolman Alec, and Krishnamurthy Arvind. 2016. MCDNN: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services. ACM, New York, NY, 123136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Hu Pengfei, Ning Huansheng, Qiu Tie, Zhang Yanfei, and Luo Xiong. 2017. Fog computing-based face identification and resolution scheme in Internet of Things. IEEE Transactions on Industrial Informatics 13, 4 (2017), 19101920.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Hu Yun Chao, Patel Milan, Sabella Dario, Sprecher Nurit, and Young Valerie. 2015. Mobile Edge Computing—A Key Technology Towards 5G. White Paper No. 11. ETSI.Google ScholarGoogle Scholar
  30. [30] Huawei. 2019. Huawei Atlas 500 Edge Station. Retrieved September 15, 2021 from https://e.huawei.com/en/products/cloud-computing-dc/servers/g-series/atlas-500.Google ScholarGoogle Scholar
  31. [31] Huynh Loc N., Lee Youngki, and Balan Rajesh Krishna. 2017. DeepMon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, New York, NY, 8295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Hwang Kyuyeon and Sung Wonyong. 2014. Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1. In Proceedings of the 2014 IEEE Workshop on Signal Processing Systems (SiPS’14). IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Kang Y., Hauswald J., Gao C., Rovinski A., Mudge T., Mars J., and Tang L.. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). ACM, New York, NY, 615629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 10971105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Lee Jooyoung, Cho Seunghyun, and Beack Seung-Kwon. 2018. Context-adaptive entropy model for end-to-end optimized image compression. arXiv:1809.10452.Google ScholarGoogle Scholar
  36. [36] Li Hongshan, Guo Yu, Wang Zhi, Xia Shutao, and Zhu Wenwu. 2019. AdaCompress: Adaptive compression for online computer vision services. In Proceedings of the 27th ACM International Conference on Multimedia. 24402448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Li Hongshan, Hu Chenghao, Jiang Jingyan, Wang Zhi, Wen Yonggang, and Zhu Wenwu. 2018. JALAD: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems (ICPADS’18). 671678. https://doi.org/10.1109/PADSW.2018.8645013Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Library Python Imaging. 2019. Image File Formats. Retrieved September 15, 2021 from https://pillow.readthedocs.io/en/stable/.Google ScholarGoogle Scholar
  39. [39] Liu Zihao, Liu Tao, Wen Wujie, Jiang Lei, Xu Jie, Wang Yanzhi, and Quan Gang. 2018. DeepN-JPEG: A deep neural network favorable JPEG-based image compression framework. In Proceedings of the 55th Annual Design Automation Conference. ACM, New York, NY, 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Minnen David, Ballé Johannes, and Toderici George D.. 2018. Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems. 1077110780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Mnih Volodymyr, Kavukcuoglu Koray, Silver David, Graves Alex, Antonoglou Ioannis, Wierstra Daan, and Riedmiller Martin. 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602.Google ScholarGoogle Scholar
  42. [42] Ohm Jens-Rainer and Sullivan Gary J.. 2018. Versatile video coding–towards the next generation of video compression. In Proceedings of the Picture Coding Symposium, Vol. 2018.Google ScholarGoogle Scholar
  43. [43] Image-Net.org. 2012. ImageNet Large Scale Visual Recognition Challenge 2012. Retrieved September 15, 2021 from http://image-net.org/challenges/LSVRC/2012/.Google ScholarGoogle Scholar
  44. [44] Pi Raspberry. 2019. Raspberry Pi 4 Model B. Retrieved September 15, 2021 from https://www.raspberrypi.org/products/raspberry-pi-4-model-b/.Google ScholarGoogle Scholar
  45. [45] Rabbani Majid. 2002. JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging 11, 2 (2002), 286.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Radford Alec, Narasimhan Karthik, Salimans Tim, and Sutskever Ilya. 2018. Improving Language Understanding by Generative Pre-training. Retrieved September 15, 2021 from https://www.cs.ubc.ca/ amuham01/LING530/papers/radford2018improving.pdf.Google ScholarGoogle Scholar
  47. [47] Flynn R.. 2019. Lossy Image Optimization. Retrieved September 15, 2021 from https://github.com/rflynn/imgmin.Google ScholarGoogle Scholar
  48. [48] Rippel Oren and Bourdev Lubomir. 2017. Real-time adaptive image compression. In Proceedings of the 34th International Conference on Machine Learning—Volume 70. 29222930. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211252. https://doi.org/10.1007/s11263-015-0816-y Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Sadeghi Alireza, Wang Gang, and Giannakis Georgios B.. 2019. Deep reinforcement learning for adaptive caching in hierarchical content delivery networks. IEEE Transactions on Cognitive Communications and Networking 5, 4 (2019), 10241033.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 45104520.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Satyanarayanan Mahadev. 2017. The emergence of edge computing. Computer 50, 1 (2017), 3039. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Selvaraju Ramprasaath R., Cogswell Michael, Das Abhishek, Vedantam Ramakrishna, Parikh Devi, and Batra Dhruv. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618626.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google ScholarGoogle Scholar
  55. [55] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google ScholarGoogle Scholar
  56. [56] SpeedTest. 2019. Speedtest Global Index. Retrieved September 15, 2021 from https://www.speedtest.net/global-index.Google ScholarGoogle Scholar
  57. [57] Sullivan Gary J., Ohm Jens-Rainer, Han Woo-Jin, and Wiegand Thomas. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 16491668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Sutton Richard S. and Barto Andrew G.. 2018. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Theis Lucas, Shi Wenzhe, Cunningham Andrew, and Huszár Ferenc. 2017. Lossy image compression with compressive autoencoders. arXiv:1703.00395.Google ScholarGoogle Scholar
  61. [61] Toderici George, O’Malley Sean M., Hwang Sung Jin, Vincent Damien, Minnen David, Baluja Shumeet, Covell Michele, and Sukthankar Rahul. 2015. Variable rate image compression with recurrent neural networks. arXiv:1511.06085.Google ScholarGoogle Scholar
  62. [62] Toderici George, Vincent Damien, Johnston Nick, Hwang Sung Jin, Minnen David, Shor Joel, and Covell Michele. 2017. Full resolution image compression with recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 53065314.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Torfason Robert, Mentzer Fabian, Agustsson Eirikur, Tschannen Michael, Timofte Radu, and Gool Luc Van. 2018. Towards image understanding from deep compression without decoding. arXiv:1803.06131.Google ScholarGoogle Scholar
  64. [64] Wallace Gregory K.. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Wang Fangxin, Wang Feng, Liu Jiangchuan, Shea Ryan, and Sun Lifeng. 2020. Intelligent video caching at network edge: A multi-agent deep reinforcement learning approach. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM’20). IEEE, Los Alamitos, CA, 24992508.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Yang Zhaohui, Wang Yunhe, Xu Chang, Du Peng, Xu Chao, Xu Chunjing, and Tian Qi. 2020. Discernible image compression. In Proceedings of the 28th ACM International Conference on Multimedia. 15611569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Yuan Xiaoyong, He Pan, Zhu Qile, and Li Xiaolin. 2019. Adversarial examples: Attacks and defenses for deep learning. IEEE Transactions on Neural Networks and Learning Systems 30, 9 (2019), 28052824.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Zhong Chen, Gursoy M. Cenk, and Velipasalar Senem. 2018. A deep reinforcement learning-based framework for content caching. In Proceedings of the 2018 52nd Annual Conference on Information Sciences and Systems (CISS’18). IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
        November 2021
        529 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3492437
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 November 2021
        • Revised: 1 January 2021
        • Accepted: 1 January 2021
        • Received: 1 April 2020
        Published in tomm Volume 17, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format