Abstract
Development of machine learning (ML) enabled applications in real-world settings is challenging and requires the consideration of sound software engineering (SE) principles and practices. A large body of knowledge exists on the use of modern approaches to developing traditional software components, but not ML components. Using exploratory case study approach, this study investigates the adoption and use of existing software development approaches, specifically continuous delivery (CD), to development of ML components. Research data was collected using a multivocal literature review (MLR) and focus group technique with ten practitioners involved in developing ML-enabled systems at a large telecommunication company. The results of our MLR show that companies do not outright apply CD to the development of ML components rather as a result of improving their development practices and infrastructure over time. A process improvement conceptual model, that includes the description of CD application to ML components is developed and initially validated in the study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
By AI-enabled systems we mean the software systems that include ML components.
References
Amershi, S., et al.: Software engineering for machine learning: a case study. In: 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 291–300. IEEE (2019). https://doi.org/10.1109/ICSE-SEIP.2019.00042
Baer, J., Ngahane, S.: The winding road to better machine learning infrastructure through Tensorflow extended and Kubeflow, December 2019. https://labs.spotify.com/2019/12/13/the-winding-road-to-better-machine-learning-infrastructure-through-tensorflow-extended-and-kubeflow/
Bosch, J., Olsson, H.H., Crnkovic, I.: It takes three to tango: Requirement, outcome/data, and AI driven development. In: SiBW, pp. 177–192 (2018)
Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006). https://doi.org/10.1191/1478088706qp063oa
Derakhshan, B., Mahdiraji, A.R., Rabl, T., Markl, V.: Continuous deployment of machine learning pipelines. In: EDBT, pp. 397–408 (2019)
Fowler, M.: Continuous delivery for machine learning, September 2019, https://martinfowler.com/articles/cd4ml.html
Garousi, V., Felderer, M., Mäntylä, M.V.: Guidelines for including grey literature and conducting multivocal literature reviews in software engineering. Inf. Softw. Technol. 106, 101–121 (2019). https://doi.org/10.1016/j.infsof.2018.09.006
Google: MLOps: continuous delivery and automation pipelines in machine learning, April 2020. https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
Guo, Y., Ashmawy, K., Huang, E., Zeng, W.: Under the hood of Uber ATG’s machine learning infrastructure and versioning control platform for self-driving vehicles (2020). https://eng.uber.com/machine-learning-model-life-cycle-version-control/
Hill, C., Bellamy, R., Erickson, T., Burnett, M.: Trials and tribulations of developers of intelligent systems: a field study. In: Symposium on Visual Languages and Human-Centric Computing, pp. 162–170. IEEE (2016). https://doi.org/10.1109/VLHCC.2016.7739680
Huber, S., Wiemer, H., Schneider, D., Ihlenfeldt, S.: DMME: data mining methodology for engineering applications - a holistic extension to the crisp-DM model. Procedia CIRP 79, 403–408 (2019). 12th CIRP Conference on Intelligent Computation in Manufacturing Engineering, 18-20 July 2018, Gulf of Naples, Italy. https://doi.org/10.1016/j.procir.2019.02.106
Humble, J., Farley, D.: Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley Professional, Boston (2010)
Jackson, S., Yaqub, M., Li, C.X.: The agile deployment of machine learning models in healthcare. Front. Big Data 1, 7 (2019). https://doi.org/10.3389/fdata.2018.00007
Kim, M., Zimmermann, T., DeLine, R., Begel, A.: Data scientists in software teams: state of the art and challenges. IEEE Trans. Softw. Eng. 44(11), 1024–1038 (2018). https://doi.org/10.1109/TSE.2017.2754374
Kontio, J., Bragge, J., Lehtola, L.: The focus group method as an empirical tool in software engineering. In: Shull, F., Singer, J., Sjøberg, D.I.K. (eds.) Guide to Advanced Empirical Software Engineering, pp. 93–116. Springer, London (2008). https://doi.org/10.1007/978-1-84800-044-5_4
Lara, A.F.: Continuous delivery for ml models (2018). https://medium.com/onfido-tech/continuous-delivery-for-ml-models-c1f9283aa971
Lwakatare, L.E., Kuvaja, P., Oivo, M.: Relationship of DevOps to agile, lean and continuous deployment. In: Abrahamsson, P., Jedlitschka, A., Nguyen Duc, A., Felderer, M., Amasaki, S., Mikkonen, T. (eds.) PROFES 2016. LNCS, vol. 10027, pp. 399–415. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49094-6_27
Lwakatare, L.E., Raj, A., Crnkovic, I., Bosch, J., Olsson, H.H.: Large-scale machine learning systems in real-world industrial settings: a review of challenges and solutions. Inf. Softw. Tech. 106368 (2020). https://doi.org/10.1016/j.infsof.2020.106368
Murphy, C., Kaiser, G.E., Arias, M.: An approach to software testing of machine learning applications. SEKE 167, 52–57 (2007)
Ngahane, S., Goodsell, D.: Productionizing ML with workows at Twitter, December 2019. https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html
Ozkaya, I.: What is really different in engineering AI-enabled systems? IEEE Softw. 37(4), 3–6 (2020)
Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data management challenges in production machine learning. In: International Conference on Management of Data, pp. 1723–1726. ACM (2017). https://doi.org/10.1145/3035918.3054782
Renggli, C., et al.: Continuous integration of machine learning models with ease. ml/ci: towards a rigorous yet practical treatment. In: 2nd SysML Conference (2019)
Rodríguez, P., et al.: Continuous deployment of software intensive products and services: a systematic mapping study. J. Syst. Softw. 123, 265–291 (2017)
Schleier-Smith, J.: An architecture for agile machine learning in real-time applications. In: International Conference on Knowledge Discovery and Data Mining, pp. 2059–2068. ACM (2015)
Sculley, D., et al.: Hidden technical debt in machine learning systems. In: Advances in Neural Information Processing Systems (NIPS) vol. 28, pp. 2503–2511. Curran Associates, Inc. (2015)
Wan, Z., Xia, X., Lo, D., Murphy, G.C.: How does machine learning change software development practices? IEEE Trans. Softw. Eng. 1–15 (2019). https://doi.org/10.1109/TSE.2019.2937083
Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, January 2000
Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: survey, landscapes and horizons. IEEE Trans. Softw. Eng. 1 (2020). https://doi.org/10.1109/TSE.2019.2962027
Acknowledgement
This research was supported by Software Center, Chalmers AI Research Centre (CHAIR), and Vinnova project HoliDev. The authors would also like to thank all the participants of focus group discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lwakatare, L.E., Crnkovic, I., Rånge, E., Bosch, J. (2020). From a Data Science Driven Process to a Continuous Delivery Process for Machine Learning Systems. In: Morisio, M., Torchiano, M., Jedlitschka, A. (eds) Product-Focused Software Process Improvement. PROFES 2020. Lecture Notes in Computer Science(), vol 12562. Springer, Cham. https://doi.org/10.1007/978-3-030-64148-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-64148-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64147-4
Online ISBN: 978-3-030-64148-1
eBook Packages: Computer ScienceComputer Science (R0)