research-article

On Serving Image Classification Models

Authors:

Aurora González-Vidal,

Alexander Isenko,

K. R. JayaramAuthors Info & Claims

WoSC '23: Proceedings of the 9th International Workshop on Serverless Computing

Pages 48 - 52

https://doi.org/10.1145/3631295.3631401

Published: 11 December 2023 Publication History

Abstract

This paper aims to optimize model inference in interactive applications by reducing the infrastructure costs. It seeks to improve resource utilization, lower costs, and enhance the scalability and responsiveness of model serving systems. The focus is on achieving efficient inference in computer vision but has potential applications in other domains. The study involved experiments using a single GPU to analyze the impact of input image size and mini-batch size on request delivery time for image classification. Key findings include a model to estimate GPU warm-up time based on four parameters, the ratification of the existence of a linear relationship between mini-batch size and inference given one particular model, and the need to consider input size when selecting mini-batch size to avoid GPU crashes. Additionally, two mathematical models are proposed for further exploration using optimization algorithms. We also motivate the need to develop a more comprehensive mathematical model for soft and relaxed inference model serving.

References

[1]

Ahsan Ali, Riccardo Pinciroli, Feng Yan, and Evgenia Smirni. 2022. Optimizing inference serving on serverless platforms. Proceedings of the VLDB Endowment 15, 10 (2022).

Digital Library

[2]

Seungbeom Choi, Sunho Lee, Yeonjae Kim, Jongse Park, Youngjin Kwon, and Jaehyuk Huh. 2022. Serving heterogeneous machine learning models on Multi-GPU servers with Spatio-Temporal sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). 199--216.

[3]

Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency online prediction serving system. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 613--627.

[4]

Antonia Creswell, Murray Shanahan, and Irina Higgins. 2022. Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712 (2022).

[5]

Aparna Gopalakrishnan, Narayan P Kulkarni, Chethan B Raghavendra, Raghavendra Manjappa, Prasad Honnavalli, and Sivaraman Eswaran. 2022. PriMed: Private federated training and encrypted inference on medical images in healthcare. Expert Systems (2022), e13283.

[6]

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).

[7]

Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S McKinley, and Björn B Brandenburg. 2017. Swayam: distributed autoscaling to meet slas of machine learning inference services with resource efficiency. In Proceedings of the 18th ACM/IFIP/USENIX middleware conference. 109--120.

Digital Library

[8]

Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving DNNs like clockwork: Performance predictability from the bottom up. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 443--462.

[9]

Alexander Isenko. 2023. Basic Hardware Monitor. https://github.com/cirquit/pyhardware-monitor

[10]

Yuriy Kochura, Yuri Gordienko, Vlad Taran, Nikita Gordienko, Alexandr Rokovyi, Oleg Alienin, and Sergii Stirenko. 2020. Batch size influence on performance of graphic and tensor processing units during training and inference phases. In Advances in Computer Science for Engineering and Education II. Springer, 658--668.

[11]

Pouya Kousha, Bharath Ramesh, Kaushik Kandadi Suresh, Ching-Hsiang Chu, Arpan Jain, Nick Sarkauskas, Hari Subramoni, and Dhabaleswar K Panda. 2019. Designing a profiling and visualization tool for scalable and in-depth analysis of high-performance GPU clusters. In 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). IEEE, 93--102.

[12]

Zichong Li, Lan Zhang, Mu Yuan, Miaohui Song, and Qi Song. 2023. Efficient Deep Ensemble Inference via Query Difficulty-dependent Task Scheduling. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 1005--1018.

[13]

Najmeh Razfar, Julian True, Rodina Bassiouny, Vishaal Venkatesh, and Rasha Kashef. 2022. Weed detection in soybean crops using custom lightweight deep learning models. Journal of Agriculture and Food Research 8 (2022), 100308.

[14]

Francisco Romero, Qian Li, Neeraja J Yadwadkar, and Christos Kozyrakis. 2021. INFaaS: Automated model-less inference serving. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 397--411.

[15]

Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105--6114.

[16]

Stylianos I Venieris, Ioannis Panopoulos, and Iakovos S Venieris. 2021. OODIn: An optimised on-device inference framework for heterogeneous mobile devices. In 2021 IEEE International Conference on Smart Computing (SMARTCOMP). IEEE, 1--8.

[17]

Luping Wang, Lingyun Yang, Yinghao Yu, Wei Wang, Bo Li, Xianchao Sun, Jian He, and Liping Zhang. 2021. Morphling: fast, near-optimal auto-configuration for cloud-native model serving. In Proceedings of the ACM Symposium on Cloud Computing. 639--653.

Digital Library

[18]

Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 1049--1062.

[19]

Hong Zhang, Yupeng Tang, Anurag Khandelwal, and Ion Stoica. 2023. SHEPHERD: Serving DNNs in the Wild. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 787--808.

Index Terms

On Serving Image Classification Models
1. Computing methodologies
  1. Machine learning
2. General and reference
  1. Cross-computing tools and techniques
    1. Performance

Recommendations

Inferring models with rule-based expert systems
SoICT '14: Proceedings of the 5th Symposium on Information and Communication Technology

Many works related to software engineering rely upon formal models, e.g., to perform model-checking or automatic test case generation. Nonetheless, producing such models is usually tedious and error-prone. Model inference is a research field helping in ...
Hierarchical Learning of Generative Automaton Models from Sequential Data
Software Engineering and Formal Methods
Abstract
Passive automata learning is a method for inferring automaton models from a given collection of observations of system behavior (traces). It has been applied to reactive systems with probabilistic behavior. In particular, IOAlergia is a well known ...
POSTER:In-network Model Inference for Distributed Systems via Programmable Switches
ACM SIGCOMM Posters and Demos '24: Proceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos

Model parallelism is crucial for accelerating distributed DNN inference. As a core component of distributed systems, programmable switches have demonstrated their ability in assisting both communication and partial computation. Exploiting the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WoSC '23: Proceedings of the 9th International Workshop on Serverless Computing

December 2023

68 pages

ISBN:9798400704550

DOI:10.1145/3631295

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

ACM: Association for Computing Machinery

In-Cooperation

IFIP: International Federation for Information Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

Middleware '23

Sponsor:

ACM

Middleware '23: 24th International Middleware Conference

December 11 - 15, 2023

Bologna, Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
61
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)2

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten