Training and Serving Machine Learning Models at Scale

Baresi, Luciano; Quattrocchi, Giovanni

doi:10.1007/978-3-031-20984-0_48

Luciano Baresi¹³ &
Giovanni Quattrocchi¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13740))

Included in the following conference series:

International Conference on Service-Oriented Computing

2107 Accesses

Abstract

In recent years, Web services are becoming more and more intelligent (e.g., in understanding user preferences) thanks to the integration of components that rely on Machine Learning (ML). Before users can interact (inference phase) with an ML-based service (ML-Service), the underlying ML model must learn (training phase) from existing data, a process that requires long-lasting batch computations. The management of these two, diverse phases is complex and meeting time and quality requirements can hardly be done with manual approaches.

This paper highlights some of the major issues in managing ML-services in both training and inference modes and presents some initial solutions that are able to meet set requirements with minimum user inputs. A preliminary evaluation demonstrates that our solutions allow these systems to become more efficient and predictable with respect to their response time and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Machine Learning as a Service (MLaaS)—An Enterprise Perspective

Machine learning inference serving models in serverless computing: a survey

Article 07 January 2025

Facilitating and Managing Machine Learning and Data Analysis Tasks in Big Data Environments Using Web and Microservice Technologies

Notes

References

Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the Symposium on Operating Systems Design and Implementation, pp. 265–283. USENIX (2016)
Google Scholar
Baresi, L., Denaro, G., Quattrocchi, G.: Symbolic execution-driven extraction of the parallel execution plans of spark applications. In: Proceedings of the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 246–256. ACM (2019)
Google Scholar
Baresi, L., Leva, A., Quattrocchi, G.: Fine-grained dynamic resource allocation for big-data applications. IEEE Trans. Software Eng. 47(8), 1668–1682 (2021)
Article Google Scholar
Baresi, L., Quattrocchi, G., Rasi, N.: Federated machine learning as a self-adaptive problem. In: Proceedings of the International Symposium on Software Engineering for Adaptive and Self-Managing Systems, pp. 41–47 (2021)
Google Scholar
Baresi, L., Quattrocchi, G., Rasi, N.: Resource management for TensorFlow inference. In: Hacid, H., Kao, O., Mecella, M., Moha, N., Paik, H. (eds.) ICSOC 2021. LNCS, vol. 13121, pp. 238–253. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91431-8_15
Chapter Google Scholar
Chen, C.-C., Yang, C.-L., Cheng, H.-Y.: Efficient and robust parallel DNN training through model parallelism on multi-GPU platform. arXiv preprint arXiv:1809.02839 (2018)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Deng, L.: The MNIST database of handwritten digit images for machine learning research. Signal Process. Mag. 29(6), 141–142 (2012)
Article Google Scholar
Fedorov, R., Camerada, A., Fraternali, P., Tagliasacchi, M.: Estimating snow cover from publicly available images. IEEE Trans. Multimedia 18(6), 1187–1200 (2016)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Google Scholar
Islam, M.T., Srirama, S.N., Karunasekera, S., Buyya, R.: Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J. Syst. Softw. 162, 110515 (2020)
Google Scholar
Jia, Z., Zaharia, M., Aiken, A.: Beyond data and model parallelism for deep neural networks. Proc. Mach. Learn. Syst. 1, 1–13 (2019)
Google Scholar
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Google Scholar
Jouppi, N.P., Young, C., Patil, N., Patterson, D.: A domain-specific architecture for deep neural networks. Commun. ACM 61(9), 50–59 (2018)
Google Scholar
Juba, B., Le, H.S.: Precision-recall versus accuracy and the role of large data sets. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4039–4048 (2019)
Google Scholar
Dipu Kabir, H.M., Khosravi, A., Hosen, M.A., Nahavandi, S.: Neural network-based uncertainty quantification: a survey of methodologies and applications. IEEE Access 6, 36218–36234 (2018)
Google Scholar
Labidi, T., Mtibaa, A., Gaaloul, W., Tata, S., Gargouri, F.: Cloud SLA modeling and monitoring. In: Proceedings of the International Conference on Services Computing, pp. 338–345. IEEE (2017)
Google Scholar
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems. Annual Conference on Neural Information Processing Systems, vol. 30, pp. 6402–6413 (2017)
Google Scholar
Lam, C.: Hadoop in Action. Simon and Schuster (2010)
Google Scholar
Li, L., et al.: A system for massively parallel hyperparameter tuning. Proc. Mach. Learn. Syst. 2, 230–246 (2020)
Google Scholar
Mohri, M., Rostamizadeh, A., Talwalkar, A.. Foundations of Machine Learning. MIT Press (2018)
Google Scholar
Morabito, R., Chiang, M.: Discover, provision, and orchestration of machine learning inference services in heterogeneous edge. In: 41st International Conference on Distributed Computing Systems, pp. 1116–1119. IEEE (2021)
Google Scholar
Nguyen, N., Khan, M.M.H., Wang, K.: Towards automatic tuning of apache spark configuration. In: IEEE International Conference on Cloud Computing, pp. 417–425 (2018)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, Annual Conference on Neural Information Processing Systems, vol. 32, pp. 8024–8035 (2019)
Google Scholar
Sahai, A., Durante, A., Machiraju, V.: Towards Automated SLA Management for Web Services. Hewlett-Packard Research Report HPL-2001-310 (R. 1) (2002)
Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Vabalas, A., Gowen, E., Poliakoff, E., Casson, A.J.: Machine learning algorithm validation with a limited sample size. PloS ONE 14(11), e0224365 (2019)
Google Scholar
Weiss, M., Tonella, P.: Uncertainty-wizard: fast and user-friendly neural network uncertainty quantification. In: Proceedings of the International Conference on Software Testing, Verification and Validation, pp. 436–441. IEEE (2021)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Jia Xu and David Lorge Parnas: Scheduling processes with release times, deadlines, precedence and exclusion relations. IEEE Trans. Softw. Eng. 16(3), 360–369 (1990)
Article Google Scholar
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2), 1–19 (2019)
Article Google Scholar
Zaharia, M., et al.: Spark: cluster computing with working sets. In: Proceedings of the International Conference on Hot Topics in Cloud Computing. USENIX (2010)
Google Scholar
Zhang, X., Zou, J., He, K., Sun, J.: Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 1943–1955 (2015)
Article Google Scholar

Download references

Acknowledgments

This work has been partially supported by the SISMA (MIUR, PRIN 2017, Contract 201752ENYB) and EMELIOT (MUR, PRIN 2020, Contract 2020W3A5FY) national research projects.

Author information

Authors and Affiliations

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
Luciano Baresi & Giovanni Quattrocchi

Authors

Luciano Baresi
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Quattrocchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Quattrocchi .

Editor information

Editors and Affiliations

University of Malaga, Málaga, Spain
Javier Troya
University of Michigan-Dearborn, Dearborn, MI, USA
Brahim Medjahed
University of Castilla-La Mancha, Ciudad Real, Spain
Mario Piattini
The University of New South Wales, Sydney, NSW, Australia
Lina Yao
University of Seville, Seville, Spain
Pablo Fernández
University of Seville, Seville, Spain
Antonio Ruiz-Cortés

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baresi, L., Quattrocchi, G. (2022). Training and Serving Machine Learning Models at Scale. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds) Service-Oriented Computing. ICSOC 2022. Lecture Notes in Computer Science, vol 13740. Springer, Cham. https://doi.org/10.1007/978-3-031-20984-0_48

Download citation

DOI: https://doi.org/10.1007/978-3-031-20984-0_48
Published: 22 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20983-3
Online ISBN: 978-3-031-20984-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Training and Serving Machine Learning Models at Scale