skip to main content
10.1145/3427921.3450256acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
short-paper
Public Access

PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference

Authors Info & Claims
Published:09 April 2021Publication History

ABSTRACT

Executing deep-learning inference on cloud servers enables the usage of high complexity models for mobile devices with limited resources. However, pre-execution time-the time it takes to prepare and transfer data to the cloud-is variable and can take orders of magnitude longer to complete than inference execution itself. This pre-execution time can be reduced by dynamically deciding the order of two essential steps, preprocessing and data transfer, to better take advantage of on-device resources and network conditions. In this work, we present PieSlicer, a system for making dynamic preprocessing decisions to improve cloud inference performance using linear regression models. PieSlicer then leverages these models to select the appropriate preprocessing location. We show that for image classification applications PieSlicer reduces median and 99th percentile pre-execution time by up to 50.2ms and 217.2ms respectively when compared to static preprocessing methods.

References

  1. Qualcomm on Tour. https://www.anandtech.com/show/11201/qualcomm-snapdragon-835-performance-preview/5.Google ScholarGoogle Scholar
  2. NVIDIA Triton Inference Server. https://developer.nvidia.com/nvidia-triton-inference-server.Google ScholarGoogle Scholar
  3. Deep Learning for Siri's Voice. https://machinelearning.apple.com/2017/08/06/siri-voices.html, 2017.Google ScholarGoogle Scholar
  4. Pixel 2 - Wikipedia. https://en.wikipedia.org/wiki/Pixel_2, 2019.Google ScholarGoogle Scholar
  5. Bianco, S. et al. Benchmark analysis of representative deep neural network architectures. 2018. doi: 10.1109/ACCESS.2018.2877890.Google ScholarGoogle Scholar
  6. Chen, T. et al. TVM: An automated end-to-end optimizing compiler for deep learning. In OSDI 18, 2018.Google ScholarGoogle Scholar
  7. Chen, T.Y.H. et al. Glimpse: Continuous, real-time object recognition on mobile devices. In SenSys '15, 2015.Google ScholarGoogle Scholar
  8. Chun, B.G. et al. Clonecloud: Elastic execution between mobile device and cloud. In EuroSys '11, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Crankshaw, D. et al. Clipper: A low-latency online prediction serving system. In 14th USENIX Symposium on Networked Systems Design and Implementation, 2017.Google ScholarGoogle Scholar
  10. Cuervo, E. et al. Maui: Making smartphones last longer with code offload. In ACM MobiSys 2010. Association for Computing Machinery, Inc., June 2010.Google ScholarGoogle Scholar
  11. Dai, X. et al. Recurrent Networks for Guided Multi-Attention Classification. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'20).Google ScholarGoogle Scholar
  12. Deng, J. et al. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR'09.Google ScholarGoogle Scholar
  13. Goodfellow, I. et al.Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gujarati, A. et al. Swayam: Distributed autoscaling to meet slas of machine learning inference services with resource efficiency. In Middleware '17, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gujarati, A. et al. Serving DNNs like clockwork: Performance predictability from the bottom up. In OSDI'20, 2020.Google ScholarGoogle Scholar
  16. Guo, T. Cloud-based or on-device: An empirical study of mobile deep inference. In IC2E'18, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  17. He, K. et al. Deep residual learning for image recognition. CVPR'16.Google ScholarGoogle Scholar
  18. Howard, A.G. et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. abs/1704.04861, 2017.Google ScholarGoogle Scholar
  19. Hu, J. et al. Banner: An image sensor reconfiguration framework for seamless resolution-based tradeoffs. MobiSys '19.Google ScholarGoogle Scholar
  20. Huang, J. et al. Clio: Enabling automatic compilation of deep learning pipelines across iot and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ignatov, A. et al. AI benchmark: All about deep learning on smartphones in 2019. CoRR, abs/1910.06663, 2019. URL http://arxiv.org/abs/1910.06663.Google ScholarGoogle Scholar
  22. Ishakian, V. et al. Serving deep learning models in a serverless platform. CoRR,abs/1710.08460, 2017. URL http://arxiv.org/abs/1710.08460.Google ScholarGoogle Scholar
  23. Jia, Y. et al. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jouppi, N.P. et al. In-data center performance analysis of a tensor processing unit. In ISCA'17, pages 1--12, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kang, Y. et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ACM SIGARCH Computer Architecture News, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kannan, R.S. et al. Grandslam: Guaranteeing slas for jobs in microservices execution frameworks. In Proceedings of the Fourteenth EuroSys Conference 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kosta, S. et al. Thinkair: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading. In 2012 Proceedings IEEE INFOCOM, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  28. Krizhevsky, A. et al. Learning multiple layers of features from tiny images. Technical report, 2009.Google ScholarGoogle Scholar
  29. LeMay, M. et al. Perseus: Characterizing performance and cost of multi-tenant serving for cnn models. In IC2E'20, pages 66--72. IEEE, 2020.Google ScholarGoogle Scholar
  30. Liang, Q. et al. AI on the edge: Rethinking AI-based IoT applications using specialized edge architectures. arXiv preprint arXiv:2003.12488, 2020.Google ScholarGoogle Scholar
  31. List, N. et al. Svm-optimization and steepest-descent line search. In Proceedings of the 22nd Annual Conference on Computational Learning Theory, 2009.Google ScholarGoogle Scholar
  32. Liu, L. et al. Edge assisted real-time object detection for mobile augmented reality. In MobiCom'19, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Liu, Z. et al. Deep n-jpeg: a deep neural network favorable jpeg-based image compression framework. InDAC'18, pages 1--6, 2018.Google ScholarGoogle Scholar
  34. Ogden, S.S. et al. MODI: Mobile deep inference made efficient by edge computing. In USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18), 2018.Google ScholarGoogle Scholar
  35. Ogden, S.S. et al. Mdinference: Balancing inference accuracy and latency for mobile applications. In IC2E 2020, 2020.Google ScholarGoogle Scholar
  36. Ogden, S.S. et al. Pieslicer. https://github.com/cake-lab/PieSlicer, 2020.Google ScholarGoogle Scholar
  37. Olston, C. et al. Tensor flow-serving: Flexible, high-performance ml serving. In Workshop on ML Systems at NIPS 2017, 2017.Google ScholarGoogle Scholar
  38. Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32. 2019.Google ScholarGoogle Scholar
  39. Ran, X. et al. Deep decision: A mobile deep learning framework for edge video analytics. In IEEE Conference on Computer Communications, 2018.Google ScholarGoogle Scholar
  40. Rayner, K. et al. Masking of foveal and parafoveal vision during eye fixations in reading. J. Exp. Psychol. Hum. Percept. Perform., 1981.Google ScholarGoogle Scholar
  41. Reddi, V.J. et al. Mlperf inference benchmark. In ISCA'20, pages 446--459.Google ScholarGoogle Scholar
  42. Rice, A. et al. Measuring mobile phone energy consumption for 802.11 wireless networking.Pervasive and Mobile Computing, 6(6):593--606, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Romero, F. et al. Infaas: Managed & model-less inference serving. CoRR, abs/1905.13348, 2019. URL http://arxiv.org/abs/1905.13348.Google ScholarGoogle Scholar
  44. Soifer, J. et al. Deep learning inference service at Microsoft. In 2019 USENIX Conference on Operational Machine Learning (OpML 19), 2019.Google ScholarGoogle Scholar
  45. Szegedy, C. et al. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.Google ScholarGoogle Scholar
  46. Teerapittayanon, S. et al. Distributed deep neural networks over the cloud, the edge and end devices. In ICDCS'17, pages 328--339. IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  47. Wallace, G.K. The jpeg still picture compression standard. IEEE transactions on consumer electronics, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wu, C.J. et al. Machine learning at facebook: Understanding inference at the edge. In HPCA'19, pages 331--344. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  49. Xie, X. et al. Source compression with bounded DNN perception loss for IoTedge computer vision. In MobiCom'19, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Xu, M. et al. A first look at deep learning apps on smartphones. In The World Wide Web Conference, WWW '19, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhang, C. et al. Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 USENIX Annual Technical Conference.Google ScholarGoogle Scholar
  52. Zoph, B. et al. Learning transferable architectures for scalable image recognition. CoRR, abs/1707.07012, 2017. URL http://arxiv.org/abs/1707.07012.Google ScholarGoogle Scholar

Index Terms

  1. PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering
          April 2021
          301 pages
          ISBN:9781450381949
          DOI:10.1145/3427921

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 April 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          ICPE '21 Paper Acceptance Rate16of61submissions,26%Overall Acceptance Rate252of851submissions,30%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader