From a Data Science Driven Process to a Continuous Delivery Process for Machine Learning Systems

Lwakatare, Lucy Ellen; Crnkovic, Ivica; Rånge, Ellinor; Bosch, Jan

doi:10.1007/978-3-030-64148-1_12

Lucy Ellen Lwakatare¹¹,
Ivica Crnkovic¹¹,
Ellinor Rånge¹² &
…
Jan Bosch¹¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12562))

Included in the following conference series:

International Conference on Product-Focused Software Process Improvement

5377 Accesses

Abstract

Development of machine learning (ML) enabled applications in real-world settings is challenging and requires the consideration of sound software engineering (SE) principles and practices. A large body of knowledge exists on the use of modern approaches to developing traditional software components, but not ML components. Using exploratory case study approach, this study investigates the adoption and use of existing software development approaches, specifically continuous delivery (CD), to development of ML components. Research data was collected using a multivocal literature review (MLR) and focus group technique with ten practitioners involved in developing ML-enabled systems at a large telecommunication company. The results of our MLR show that companies do not outright apply CD to the development of ML components rather as a result of improving their development practices and infrastructure over time. A process improvement conceptual model, that includes the description of CD application to ML components is developed and initially validated in the study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Status Quo and Problems of Requirements Engineering for Machine Learning: Results from an International Survey

Machine learning application development: practitioners’ insights

Article 30 March 2023

ML-Enabled Systems Model Deployment and Monitoring: Status Quo and Problems

Notes

1.
By AI-enabled systems we mean the software systems that include ML components.

References

Amershi, S., et al.: Software engineering for machine learning: a case study. In: 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 291–300. IEEE (2019). https://doi.org/10.1109/ICSE-SEIP.2019.00042
Baer, J., Ngahane, S.: The winding road to better machine learning infrastructure through Tensorflow extended and Kubeflow, December 2019. https://labs.spotify.com/2019/12/13/the-winding-road-to-better-machine-learning-infrastructure-through-tensorflow-extended-and-kubeflow/
Bosch, J., Olsson, H.H., Crnkovic, I.: It takes three to tango: Requirement, outcome/data, and AI driven development. In: SiBW, pp. 177–192 (2018)
Google Scholar
Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006). https://doi.org/10.1191/1478088706qp063oa
Article Google Scholar
Derakhshan, B., Mahdiraji, A.R., Rabl, T., Markl, V.: Continuous deployment of machine learning pipelines. In: EDBT, pp. 397–408 (2019)
Google Scholar
Fowler, M.: Continuous delivery for machine learning, September 2019, https://martinfowler.com/articles/cd4ml.html
Garousi, V., Felderer, M., Mäntylä, M.V.: Guidelines for including grey literature and conducting multivocal literature reviews in software engineering. Inf. Softw. Technol. 106, 101–121 (2019). https://doi.org/10.1016/j.infsof.2018.09.006
Article Google Scholar
Google: MLOps: continuous delivery and automation pipelines in machine learning, April 2020. https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
Guo, Y., Ashmawy, K., Huang, E., Zeng, W.: Under the hood of Uber ATG’s machine learning infrastructure and versioning control platform for self-driving vehicles (2020). https://eng.uber.com/machine-learning-model-life-cycle-version-control/
Hill, C., Bellamy, R., Erickson, T., Burnett, M.: Trials and tribulations of developers of intelligent systems: a field study. In: Symposium on Visual Languages and Human-Centric Computing, pp. 162–170. IEEE (2016). https://doi.org/10.1109/VLHCC.2016.7739680
Huber, S., Wiemer, H., Schneider, D., Ihlenfeldt, S.: DMME: data mining methodology for engineering applications - a holistic extension to the crisp-DM model. Procedia CIRP 79, 403–408 (2019). 12th CIRP Conference on Intelligent Computation in Manufacturing Engineering, 18-20 July 2018, Gulf of Naples, Italy. https://doi.org/10.1016/j.procir.2019.02.106
Humble, J., Farley, D.: Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley Professional, Boston (2010)
Google Scholar
Jackson, S., Yaqub, M., Li, C.X.: The agile deployment of machine learning models in healthcare. Front. Big Data 1, 7 (2019). https://doi.org/10.3389/fdata.2018.00007
Article Google Scholar
Kim, M., Zimmermann, T., DeLine, R., Begel, A.: Data scientists in software teams: state of the art and challenges. IEEE Trans. Softw. Eng. 44(11), 1024–1038 (2018). https://doi.org/10.1109/TSE.2017.2754374
Article Google Scholar
Kontio, J., Bragge, J., Lehtola, L.: The focus group method as an empirical tool in software engineering. In: Shull, F., Singer, J., Sjøberg, D.I.K. (eds.) Guide to Advanced Empirical Software Engineering, pp. 93–116. Springer, London (2008). https://doi.org/10.1007/978-1-84800-044-5_4
Chapter Google Scholar
Lara, A.F.: Continuous delivery for ml models (2018). https://medium.com/onfido-tech/continuous-delivery-for-ml-models-c1f9283aa971
Lwakatare, L.E., Kuvaja, P., Oivo, M.: Relationship of DevOps to agile, lean and continuous deployment. In: Abrahamsson, P., Jedlitschka, A., Nguyen Duc, A., Felderer, M., Amasaki, S., Mikkonen, T. (eds.) PROFES 2016. LNCS, vol. 10027, pp. 399–415. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49094-6_27
Chapter Google Scholar
Lwakatare, L.E., Raj, A., Crnkovic, I., Bosch, J., Olsson, H.H.: Large-scale machine learning systems in real-world industrial settings: a review of challenges and solutions. Inf. Softw. Tech. 106368 (2020). https://doi.org/10.1016/j.infsof.2020.106368
Murphy, C., Kaiser, G.E., Arias, M.: An approach to software testing of machine learning applications. SEKE 167, 52–57 (2007)
Google Scholar
Ngahane, S., Goodsell, D.: Productionizing ML with workows at Twitter, December 2019. https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html
Ozkaya, I.: What is really different in engineering AI-enabled systems? IEEE Softw. 37(4), 3–6 (2020)
Article Google Scholar
Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data management challenges in production machine learning. In: International Conference on Management of Data, pp. 1723–1726. ACM (2017). https://doi.org/10.1145/3035918.3054782
Renggli, C., et al.: Continuous integration of machine learning models with ease. ml/ci: towards a rigorous yet practical treatment. In: 2nd SysML Conference (2019)
Google Scholar
Rodríguez, P., et al.: Continuous deployment of software intensive products and services: a systematic mapping study. J. Syst. Softw. 123, 265–291 (2017)
Article Google Scholar
Schleier-Smith, J.: An architecture for agile machine learning in real-time applications. In: International Conference on Knowledge Discovery and Data Mining, pp. 2059–2068. ACM (2015)
Google Scholar
Sculley, D., et al.: Hidden technical debt in machine learning systems. In: Advances in Neural Information Processing Systems (NIPS) vol. 28, pp. 2503–2511. Curran Associates, Inc. (2015)
Google Scholar
Wan, Z., Xia, X., Lo, D., Murphy, G.C.: How does machine learning change software development practices? IEEE Trans. Softw. Eng. 1–15 (2019). https://doi.org/10.1109/TSE.2019.2937083
Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, January 2000
Google Scholar
Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: survey, landscapes and horizons. IEEE Trans. Softw. Eng. 1 (2020). https://doi.org/10.1109/TSE.2019.2962027

Download references

Acknowledgement

This research was supported by Software Center, Chalmers AI Research Centre (CHAIR), and Vinnova project HoliDev. The authors would also like to thank all the participants of focus group discussions.

Author information

Authors and Affiliations

Chalmers University of Technology, Gothenburg University, Gothenburg, Sweden
Lucy Ellen Lwakatare, Ivica Crnkovic & Jan Bosch
Ericsson, Gothenburg, Sweden
Ellinor Rånge

Authors

Lucy Ellen Lwakatare
View author publications
You can also search for this author in PubMed Google Scholar
Ivica Crnkovic
View author publications
You can also search for this author in PubMed Google Scholar
Ellinor Rånge
View author publications
You can also search for this author in PubMed Google Scholar
Jan Bosch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucy Ellen Lwakatare .

Editor information

Editors and Affiliations

Politecnico di Torino, Turin, Torino, Italy
Maurizio Morisio
Polytechnic University of Turin, Turin, Torino, Italy
Marco Torchiano
Fraunhofer Institute for Experimental Software Engineering, Kaiserslautern, Rheinland-Pfalz, Germany
Andreas Jedlitschka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lwakatare, L.E., Crnkovic, I., Rånge, E., Bosch, J. (2020). From a Data Science Driven Process to a Continuous Delivery Process for Machine Learning Systems. In: Morisio, M., Torchiano, M., Jedlitschka, A. (eds) Product-Focused Software Process Improvement. PROFES 2020. Lecture Notes in Computer Science(), vol 12562. Springer, Cham. https://doi.org/10.1007/978-3-030-64148-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-64148-1_12
Published: 21 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64147-4
Online ISBN: 978-3-030-64148-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics