Skip to main content

From a Data Science Driven Process to a Continuous Delivery Process for Machine Learning Systems

  • Conference paper
  • First Online:
Product-Focused Software Process Improvement (PROFES 2020)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12562))

Abstract

Development of machine learning (ML) enabled applications in real-world settings is challenging and requires the consideration of sound software engineering (SE) principles and practices. A large body of knowledge exists on the use of modern approaches to developing traditional software components, but not ML components. Using exploratory case study approach, this study investigates the adoption and use of existing software development approaches, specifically continuous delivery (CD), to development of ML components. Research data was collected using a multivocal literature review (MLR) and focus group technique with ten practitioners involved in developing ML-enabled systems at a large telecommunication company. The results of our MLR show that companies do not outright apply CD to the development of ML components rather as a result of improving their development practices and infrastructure over time. A process improvement conceptual model, that includes the description of CD application to ML components is developed and initially validated in the study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    By AI-enabled systems we mean the software systems that include ML components.

References

  1. Amershi, S., et al.: Software engineering for machine learning: a case study. In: 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 291–300. IEEE (2019). https://doi.org/10.1109/ICSE-SEIP.2019.00042

  2. Baer, J., Ngahane, S.: The winding road to better machine learning infrastructure through Tensorflow extended and Kubeflow, December 2019. https://labs.spotify.com/2019/12/13/the-winding-road-to-better-machine-learning-infrastructure-through-tensorflow-extended-and-kubeflow/

  3. Bosch, J., Olsson, H.H., Crnkovic, I.: It takes three to tango: Requirement, outcome/data, and AI driven development. In: SiBW, pp. 177–192 (2018)

    Google Scholar 

  4. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006). https://doi.org/10.1191/1478088706qp063oa

    Article  Google Scholar 

  5. Derakhshan, B., Mahdiraji, A.R., Rabl, T., Markl, V.: Continuous deployment of machine learning pipelines. In: EDBT, pp. 397–408 (2019)

    Google Scholar 

  6. Fowler, M.: Continuous delivery for machine learning, September 2019, https://martinfowler.com/articles/cd4ml.html

  7. Garousi, V., Felderer, M., Mäntylä, M.V.: Guidelines for including grey literature and conducting multivocal literature reviews in software engineering. Inf. Softw. Technol. 106, 101–121 (2019). https://doi.org/10.1016/j.infsof.2018.09.006

    Article  Google Scholar 

  8. Google: MLOps: continuous delivery and automation pipelines in machine learning, April 2020. https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

  9. Guo, Y., Ashmawy, K., Huang, E., Zeng, W.: Under the hood of Uber ATG’s machine learning infrastructure and versioning control platform for self-driving vehicles (2020). https://eng.uber.com/machine-learning-model-life-cycle-version-control/

  10. Hill, C., Bellamy, R., Erickson, T., Burnett, M.: Trials and tribulations of developers of intelligent systems: a field study. In: Symposium on Visual Languages and Human-Centric Computing, pp. 162–170. IEEE (2016). https://doi.org/10.1109/VLHCC.2016.7739680

  11. Huber, S., Wiemer, H., Schneider, D., Ihlenfeldt, S.: DMME: data mining methodology for engineering applications - a holistic extension to the crisp-DM model. Procedia CIRP 79, 403–408 (2019). 12th CIRP Conference on Intelligent Computation in Manufacturing Engineering, 18-20 July 2018, Gulf of Naples, Italy. https://doi.org/10.1016/j.procir.2019.02.106

  12. Humble, J., Farley, D.: Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley Professional, Boston (2010)

    Google Scholar 

  13. Jackson, S., Yaqub, M., Li, C.X.: The agile deployment of machine learning models in healthcare. Front. Big Data 1, 7 (2019). https://doi.org/10.3389/fdata.2018.00007

    Article  Google Scholar 

  14. Kim, M., Zimmermann, T., DeLine, R., Begel, A.: Data scientists in software teams: state of the art and challenges. IEEE Trans. Softw. Eng. 44(11), 1024–1038 (2018). https://doi.org/10.1109/TSE.2017.2754374

    Article  Google Scholar 

  15. Kontio, J., Bragge, J., Lehtola, L.: The focus group method as an empirical tool in software engineering. In: Shull, F., Singer, J., Sjøberg, D.I.K. (eds.) Guide to Advanced Empirical Software Engineering, pp. 93–116. Springer, London (2008). https://doi.org/10.1007/978-1-84800-044-5_4

    Chapter  Google Scholar 

  16. Lara, A.F.: Continuous delivery for ml models (2018). https://medium.com/onfido-tech/continuous-delivery-for-ml-models-c1f9283aa971

  17. Lwakatare, L.E., Kuvaja, P., Oivo, M.: Relationship of DevOps to agile, lean and continuous deployment. In: Abrahamsson, P., Jedlitschka, A., Nguyen Duc, A., Felderer, M., Amasaki, S., Mikkonen, T. (eds.) PROFES 2016. LNCS, vol. 10027, pp. 399–415. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49094-6_27

    Chapter  Google Scholar 

  18. Lwakatare, L.E., Raj, A., Crnkovic, I., Bosch, J., Olsson, H.H.: Large-scale machine learning systems in real-world industrial settings: a review of challenges and solutions. Inf. Softw. Tech. 106368 (2020). https://doi.org/10.1016/j.infsof.2020.106368

  19. Murphy, C., Kaiser, G.E., Arias, M.: An approach to software testing of machine learning applications. SEKE 167, 52–57 (2007)

    Google Scholar 

  20. Ngahane, S., Goodsell, D.: Productionizing ML with workows at Twitter, December 2019. https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html

  21. Ozkaya, I.: What is really different in engineering AI-enabled systems? IEEE Softw. 37(4), 3–6 (2020)

    Article  Google Scholar 

  22. Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data management challenges in production machine learning. In: International Conference on Management of Data, pp. 1723–1726. ACM (2017). https://doi.org/10.1145/3035918.3054782

  23. Renggli, C., et al.: Continuous integration of machine learning models with ease. ml/ci: towards a rigorous yet practical treatment. In: 2nd SysML Conference (2019)

    Google Scholar 

  24. Rodríguez, P., et al.: Continuous deployment of software intensive products and services: a systematic mapping study. J. Syst. Softw. 123, 265–291 (2017)

    Article  Google Scholar 

  25. Schleier-Smith, J.: An architecture for agile machine learning in real-time applications. In: International Conference on Knowledge Discovery and Data Mining, pp. 2059–2068. ACM (2015)

    Google Scholar 

  26. Sculley, D., et al.: Hidden technical debt in machine learning systems. In: Advances in Neural Information Processing Systems (NIPS) vol. 28, pp. 2503–2511. Curran Associates, Inc. (2015)

    Google Scholar 

  27. Wan, Z., Xia, X., Lo, D., Murphy, G.C.: How does machine learning change software development practices? IEEE Trans. Softw. Eng. 1–15 (2019). https://doi.org/10.1109/TSE.2019.2937083

  28. Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, January 2000

    Google Scholar 

  29. Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: survey, landscapes and horizons. IEEE Trans. Softw. Eng. 1 (2020). https://doi.org/10.1109/TSE.2019.2962027

Download references

Acknowledgement

This research was supported by Software Center, Chalmers AI Research Centre (CHAIR), and Vinnova project HoliDev. The authors would also like to thank all the participants of focus group discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucy Ellen Lwakatare .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lwakatare, L.E., Crnkovic, I., RĂ¥nge, E., Bosch, J. (2020). From a Data Science Driven Process to a Continuous Delivery Process for Machine Learning Systems. In: Morisio, M., Torchiano, M., Jedlitschka, A. (eds) Product-Focused Software Process Improvement. PROFES 2020. Lecture Notes in Computer Science(), vol 12562. Springer, Cham. https://doi.org/10.1007/978-3-030-64148-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64148-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64147-4

  • Online ISBN: 978-3-030-64148-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics