Abstract:
Numerous organizations are adopting sophisticated Machine Learning (ML) algorithms for their operations. To ensure the optimal performance of ML systems, organizations re...Show MoreMetadata
Abstract:
Numerous organizations are adopting sophisticated Machine Learning (ML) algorithms for their operations. To ensure the optimal performance of ML systems, organizations require insights into the response time of such systems under realistic user workloads. However, despite the widespread adoption of ML models, research on predicting the response time of a system serving an ML model under varying resources and user workloads is limited. In this paper, we address this gap by proposing a modeling approach to predict response times of multiple well-known Deep Neural Networks (DNNs) under simultaneously varying resource settings and user workloads. We join a classifier and a regressor to identify the optimal resource setting for meeting a DNN's response time target, and to predict the response time under the allocated resource setting. Our technique enables performance modeling without the need to collect extensive data during system operation, thus empowering pre-deployment predictions. The results demonstrate that our approach can generalize to unseen resource and workload scenarios, guaranteeing accurate predictions of compliance with response time targets 98.05% of the time and offering response time predictions with a mean prediction error of 9.10%.
Date of Conference: 30 October 2023 - 02 November 2023
Date Added to IEEE Xplore: 28 November 2023
ISBN Information: