short-paper

Public Access

PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference

Authors:
Samuel S. Ogden

Worcester Polytechnic Institute, Worcester, MA, USA

Worcester Polytechnic Institute, Worcester, MA, USA
View Profile

,
Xiangnan Kong

Worcester Polytechnic Institute, Worcester, MA, USA

Worcester Polytechnic Institute, Worcester, MA, USA
View Profile

,
Tian Guo

Worcester Polytechnic Institute, Worcester, MA, USA

Worcester Polytechnic Institute, Worcester, MA, USA
View Profile

ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance EngineeringApril 2021Pages 249–256https://doi.org/10.1145/3427921.3450256

Published:09 April 2021Publication History

ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering

Pages 249–256

ABSTRACT

Executing deep-learning inference on cloud servers enables the usage of high complexity models for mobile devices with limited resources. However, pre-execution time-the time it takes to prepare and transfer data to the cloud-is variable and can take orders of magnitude longer to complete than inference execution itself. This pre-execution time can be reduced by dynamically deciding the order of two essential steps, preprocessing and data transfer, to better take advantage of on-device resources and network conditions. In this work, we present PieSlicer, a system for making dynamic preprocessing decisions to improve cloud inference performance using linear regression models. PieSlicer then leverages these models to select the appropriate preprocessing location. We show that for image classification applications PieSlicer reduces median and 99th percentile pre-execution time by up to 50.2ms and 217.2ms respectively when compared to static preprocessing methods.

References

Qualcomm on Tour. https://www.anandtech.com/show/11201/qualcomm-snapdragon-835-performance-preview/5.Google Scholar
NVIDIA Triton Inference Server. https://developer.nvidia.com/nvidia-triton-inference-server.Google Scholar
Deep Learning for Siri's Voice. https://machinelearning.apple.com/2017/08/06/siri-voices.html, 2017.Google Scholar
Pixel 2 - Wikipedia. https://en.wikipedia.org/wiki/Pixel_2, 2019.Google Scholar
Bianco, S. et al. Benchmark analysis of representative deep neural network architectures. 2018. doi: 10.1109/ACCESS.2018.2877890.Google Scholar
Chen, T. et al. TVM: An automated end-to-end optimizing compiler for deep learning. In OSDI 18, 2018.Google Scholar
Chen, T.Y.H. et al. Glimpse: Continuous, real-time object recognition on mobile devices. In SenSys '15, 2015.Google Scholar
Chun, B.G. et al. Clonecloud: Elastic execution between mobile device and cloud. In EuroSys '11, 2011.Google ScholarDigital Library
Crankshaw, D. et al. Clipper: A low-latency online prediction serving system. In 14th USENIX Symposium on Networked Systems Design and Implementation, 2017.Google Scholar
Cuervo, E. et al. Maui: Making smartphones last longer with code offload. In ACM MobiSys 2010. Association for Computing Machinery, Inc., June 2010.Google Scholar
Dai, X. et al. Recurrent Networks for Guided Multi-Attention Classification. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'20).Google Scholar
Deng, J. et al. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR'09.Google Scholar
Goodfellow, I. et al.Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.Google ScholarDigital Library
Gujarati, A. et al. Swayam: Distributed autoscaling to meet slas of machine learning inference services with resource efficiency. In Middleware '17, 2017.Google ScholarDigital Library
Gujarati, A. et al. Serving DNNs like clockwork: Performance predictability from the bottom up. In OSDI'20, 2020.Google Scholar
Guo, T. Cloud-based or on-device: An empirical study of mobile deep inference. In IC2E'18, 2018.Google ScholarCross Ref
He, K. et al. Deep residual learning for image recognition. CVPR'16.Google Scholar
Howard, A.G. et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. abs/1704.04861, 2017.Google Scholar
Hu, J. et al. Banner: An image sensor reconfiguration framework for seamless resolution-based tradeoffs. MobiSys '19.Google Scholar
Huang, J. et al. Clio: Enabling automatic compilation of deep learning pipelines across iot and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, 2020.Google ScholarDigital Library
Ignatov, A. et al. AI benchmark: All about deep learning on smartphones in 2019. CoRR, abs/1910.06663, 2019. URL http://arxiv.org/abs/1910.06663.Google Scholar
Ishakian, V. et al. Serving deep learning models in a serverless platform. CoRR,abs/1710.08460, 2017. URL http://arxiv.org/abs/1710.08460.Google Scholar
Jia, Y. et al. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia, 2014.Google ScholarDigital Library
Jouppi, N.P. et al. In-data center performance analysis of a tensor processing unit. In ISCA'17, pages 1--12, 2017.Google ScholarDigital Library
Kang, Y. et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ACM SIGARCH Computer Architecture News, 2017.Google ScholarDigital Library
Kannan, R.S. et al. Grandslam: Guaranteeing slas for jobs in microservices execution frameworks. In Proceedings of the Fourteenth EuroSys Conference 2019.Google ScholarDigital Library
Kosta, S. et al. Thinkair: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading. In 2012 Proceedings IEEE INFOCOM, 2012.Google ScholarCross Ref
Krizhevsky, A. et al. Learning multiple layers of features from tiny images. Technical report, 2009.Google Scholar
LeMay, M. et al. Perseus: Characterizing performance and cost of multi-tenant serving for cnn models. In IC2E'20, pages 66--72. IEEE, 2020.Google Scholar
Liang, Q. et al. AI on the edge: Rethinking AI-based IoT applications using specialized edge architectures. arXiv preprint arXiv:2003.12488, 2020.Google Scholar
List, N. et al. Svm-optimization and steepest-descent line search. In Proceedings of the 22nd Annual Conference on Computational Learning Theory, 2009.Google Scholar
Liu, L. et al. Edge assisted real-time object detection for mobile augmented reality. In MobiCom'19, 2019.Google ScholarDigital Library
Liu, Z. et al. Deep n-jpeg: a deep neural network favorable jpeg-based image compression framework. InDAC'18, pages 1--6, 2018.Google Scholar
Ogden, S.S. et al. MODI: Mobile deep inference made efficient by edge computing. In USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18), 2018.Google Scholar
Ogden, S.S. et al. Mdinference: Balancing inference accuracy and latency for mobile applications. In IC2E 2020, 2020.Google Scholar
Ogden, S.S. et al. Pieslicer. https://github.com/cake-lab/PieSlicer, 2020.Google Scholar
Olston, C. et al. Tensor flow-serving: Flexible, high-performance ml serving. In Workshop on ML Systems at NIPS 2017, 2017.Google Scholar
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32. 2019.Google Scholar
Ran, X. et al. Deep decision: A mobile deep learning framework for edge video analytics. In IEEE Conference on Computer Communications, 2018.Google Scholar
Rayner, K. et al. Masking of foveal and parafoveal vision during eye fixations in reading. J. Exp. Psychol. Hum. Percept. Perform., 1981.Google Scholar
Reddi, V.J. et al. Mlperf inference benchmark. In ISCA'20, pages 446--459.Google Scholar
Rice, A. et al. Measuring mobile phone energy consumption for 802.11 wireless networking.Pervasive and Mobile Computing, 6(6):593--606, 2010.Google ScholarDigital Library
Romero, F. et al. Infaas: Managed & model-less inference serving. CoRR, abs/1905.13348, 2019. URL http://arxiv.org/abs/1905.13348.Google Scholar
Soifer, J. et al. Deep learning inference service at Microsoft. In 2019 USENIX Conference on Operational Machine Learning (OpML 19), 2019.Google Scholar
Szegedy, C. et al. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.Google Scholar
Teerapittayanon, S. et al. Distributed deep neural networks over the cloud, the edge and end devices. In ICDCS'17, pages 328--339. IEEE, 2017.Google ScholarCross Ref
Wallace, G.K. The jpeg still picture compression standard. IEEE transactions on consumer electronics, 1992.Google ScholarDigital Library
Wu, C.J. et al. Machine learning at facebook: Understanding inference at the edge. In HPCA'19, pages 331--344. IEEE, 2019.Google ScholarCross Ref
Xie, X. et al. Source compression with bounded DNN perception loss for IoTedge computer vision. In MobiCom'19, 2019.Google ScholarDigital Library
Xu, M. et al. A first look at deep learning apps on smartphones. In The World Wide Web Conference, WWW '19, 2019.Google ScholarDigital Library
Zhang, C. et al. Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 USENIX Annual Technical Conference.Google Scholar
Zoph, B. et al. Learning transferable architectures for scalable image recognition. CoRR, abs/1707.07012, 2017. URL http://arxiv.org/abs/1707.07012.Google Scholar

Index Terms

PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference
1. Computing methodologies
  1. Modeling and simulation
2. Networks
  1. Network performance evaluation
    1. Network performance modeling
  2. Network services
    1. Cloud computing

Recommendations

Response time for cloud computing providers
iiWAS '10: Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services

Cloud services are becoming popular in terms of distributed technology because they allow cloud users to rent well-specified resources of computing, network, and storage infrastructure. Users pay for their use of services without needing to spend ...
Read More
Addressing response time of cloud-based mobile applications
MobileCloud '13: Proceedings of the first international workshop on Mobile cloud computing & networking

With more mobile applications being developed to take advantage of the elastic cloud computing resources instead of restricting to native mobile device resources, this paper investigates a timely question: is there any fundamental challenge that needs ...
Read More
Near-Real-Time Cloud Auditing for Rapid Response
WAINA '12: Proceedings of the 2012 26th International Conference on Advanced Information Networking and Applications Workshops

Due to the rapid emergence of Information Technology, cloud computing provides assorted advantages to service providers, developers, organizations, and customers with respect to scalability, flexibility, cost-effectiveness, and availability. However, it ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering
April 2021
301 pages
ISBN:9781450381949
DOI:10.1145/3427921
General Chairs:
Johann Bourcier
University of Rennes 1, France
,
Zhen Ming (Jack) Jiang
York University, Canada
,
Program Chairs:
Cor-Paul Bezemer
University of Alberta, Canada
,
Vittorio Cortellessa
University of L'Aquila, Italy
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cloud inference
mobile deep learning
performance modeling
Qualifiers
- short-paper
Conference

Acceptance Rates
ICPE '21 Paper Acceptance Rate16of61submissions,26%Overall Acceptance Rate252of851submissions,30%
More
Upcoming Conference
ICPE '24

Sponsor:

sigsoft online

sigsoft online

15th ACM/SPEC International Conference on Performance Engineering

May 7 - 11, 2024

London , United Kingdom
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 334
  Total Downloads
- Downloads (Last 12 months)76
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference

ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Response time for cloud computing providers

Addressing response time of cloud-based mobile applications

Near-Real-Time Cloud Auditing for Rapid Response

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference

ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Response time for cloud computing providers

Addressing response time of cloud-based mobile applications

Near-Real-Time Cloud Auditing for Rapid Response

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media