Automated end-to-end management of the modeling lifecycle in deep learning

Gharibi, Gharib; Walunj, Vijay; Nekadi, Raju; Marri, Raj; Lee, Yugyung

doi:10.1007/s10664-020-09894-9

Automated end-to-end management of the modeling lifecycle in deep learning

Published: 19 February 2021

Volume 26, article number 17, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Gharib Gharibi ORCID: orcid.org/0000-0003-0062-4748¹,
Vijay Walunj¹,
Raju Nekadi¹,
Raj Marri¹ &
…
Yugyung Lee¹

1017 Accesses
15 Citations
3 Altmetric
Explore all metrics

Abstract

Deep learning has improved the state-of-the-art results in an ever-growing number of domains. This success heavily relies on the development and training of deep learning models–an experimental, iterative process that produces tens to hundreds of models before arriving at a satisfactory result. While there has been a surge in the number of tools and frameworks that aim at facilitating deep learning, the process of managing the models and their artifacts is still surprisingly challenging and time-consuming. Existing model-management solutions are either tailored for commercial platforms or require significant code changes. Moreover, most of the existing solutions address a single phase of the modeling lifecycle, such as experiment monitoring, while ignoring other essential tasks, such as model deployment. In this paper, we present a software system to facilitate and accelerate the deep learning lifecycle, named ModelKB. ModelKB can automatically manage the modeling lifecycle end-to-end, including (1) monitoring and tracking experiments; (2) visualizing, searching for, and comparing models and experiments; (3) deploying models locally and on the cloud; and (4) sharing and publishing trained models. Moreover, our system provides a stepping-stone for enhanced reproducibility. ModelKB currently supports TensorFlow 2.0, Keras, and PyTorch, and it can be extended to other deep learning frameworks easily.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Task-Specific Automation in Deep Learning Processes

ART: Actually Robust Training

A Toolkit for Analysis of Deep Learning Experiments

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/, Software available from tensorflow.org
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, pp 265–283
Albishri AA, Shah S JH, Schmiedler A, Kang SS, Lee Y (2019) Automated human claustrum segmentation using deep learning technologies. arXiv:1911.07515
Bergstra J, Breuleux O, Bastien F, Lamblin P, Pascanu R, Desjardins G, Turian J, Warde-Farley D, Bengio Y (2010) Theano: A cpu and gpu math compiler in python. In: Proc. 9th Python in Science Conf, vol 1, pp 3–10
Castelvecchi D (2016) Can we open the black box of ai?. Nat 538 (7623):20
Article Google Scholar
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, pp 2172–2180
Chollet F et al (2015) Keras. https://keras.io
DeepCognition (2019) One stop for deep learning developers. https://deepcognition.ai/
Documentation P (2019) Abstract syntax trees. https://docs.python.org/3/library/ast.html
Facebook (2019) Introducing fblearner flow: Facebook’s ai backbone. https://conferences.oreilly.com/strata/big-data-conference-ny-2015/public/schedule/detail/42988
Garcia R, Sreekanti V, Yadwadkar N, Crankshaw D, Gonzalez JE, Hellerstein JM (2018) Context: The missing piece in the machine learning lifecycle. In: KDD CMI Workshop, vol 114
Gharibi G, Walunj V, Alanazi R, Rella S, Lee Y (2019a) Automated management of deep learning experiments. In: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, pp 1–4
Gharibi G, Walunj V, Rella S, Lee Y (2019b) Modelkb: towards automated management of the modeling lifecycle in deep learning. In: 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). IEEE, pp 28–34
Ghezzi C, Jazayeri M, Mandrioli D (2002) Fundamentals of software engineering. Prentice Hall PTR
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. arXiv:1412.6572
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press. http://www.deeplearningbook.org
Goodfellow I, McDaniel P, Papernot N (2018) Making machine learning robust against adversarial inputs. Commun ACM 61(7)
Google (2019) Tensorboard: Visualizing learning. https://www.tensorflow.org/guide/summaries_and_tensorbard
Goudarzvand S, Gharibi G, Lee Y (2020) Scat: Second chance autoencoder for textual data. arXiv:2005.06632
Grinberg M (2018) Flask web development: developing web applications with python. O’Reilly Media, Inc.
Hall MA (1999) Correlation-based feature selection for machine learning
Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A et al (2014) Deep speech: Scaling up end-to-end speech recognition. arXiv:1412.5567
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hellerstein JM, Sreekanti V, Gonzalez JE, Dalton J, Dey A, Nag S, Ramachandran K, Arora S, Bhattacharyya A, Das S et al (2017) Ground: A data context service. In: CIDR
Hines ML, Morse T, Migliore M, Carnevale NT, Shepherd GM (2004) Modeldb: a database to support computational neuroscience. J Comput Neurosci 17(1):7–11
Article Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia. ACM, pp 675–678
Jinja (2019) Python template language. https://jinja.palletsprojects.com/en/2.11.x/
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1725–1732
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kumar A, McCann R, Naughton J, Patel JM (2016) Model selection management systems: The next frontier of advanced analytics. ACM SIGMOD Record 44(4):17–22
Article Google Scholar
Kumar A, Boehm M, Yang J (2017) Data management in machine learning: Challenges, techniques, and systems. In: Proceedings of the 2017 ACM International Conference on Management of Data. ACM, pp 1717–1722
Lawrence S, Giles CL, Tsoi AC, Back AD (1997) Face recognition: A convolutional neural-network approach. IEEE Trans Neural Netw 8 (1):98–113
Article Google Scholar
LeCun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp 396–404
LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Article Google Scholar
Miao H, Li A, Davis LS, Deshpande A (2017a) Modelhub: Deep learning lifecycle management. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, pp 1393–1394
Miao H, Li A, Davis LS, Deshpande A (2017b) Towards unified data and lifecycle management for deep learning. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, pp 571–582
Miao H, Deshpande A (2018) Provdb: Provenance-enabled lifecycle management of collaborative data analysis workflows. IEEE Data Eng Bull 41(4):26–38
Google Scholar
Microsoft (2017) Machine learning studio. https://azure.microsoft.com/en-us/services/machine-learning-studio/https://azure.microsoft.com/en-us/services/machine-learning-studio/
ModelHubAI (2019) A collection of deep learning models managed by the computational imaging and bioinformatics lab at the harvard medical school, brigham & women’s hospital, and dana-farber cancer institute. http://modelhub.ai/
ModelZoo (2019) A set of pretrained models models hosted on github. https://github.com/BVLC/caffe/wiki/Model-Zoo
Montavon G, Samek W, Müller K-R (2018) Methods for interpreting and understanding deep neural networks. Digital Signal Process 73:1–15
Article MathSciNet Google Scholar
Nvidia (2019) Digits: A graphical web interface for nvcaffe and tensorflow. https://docs.nvidia.com/deeplearning/digits/digits-user-guide/index.html
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop
PyTorch (2019) A set of pretrained pytorch models. https://pytorch.org/docs/stable/torchvision/models.html
Roeder L (2019) Netron: Visualizing deep learning models. https://github.com/lutzroeder/netron
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis (IJCV) 115 (3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
SageMaker (2018) Sagemaker. https://aws.amazon.com/sagemaker//
Schelter S, Böse J-H, Kirschnick J, Klein T, Seufert S (2017) Automatically tracking metadata and provenance of machine learning experiments. In: Machine Learning Systems Workshop at NIPS
Schelter S, Biessmann F, Januschowski T, Salinas D, Seufert S, Szarvas G, Vartak M, Madden S, Miao H, Deshpande A et al (2018a) On challenges in machine learning model management. IEEE Data Eng Bull 41(4):5–15
Google Scholar
Schelter S, Böse J-H, Kirschnick J, Klein T, Seufert S (2018b) Declarative metadata management: A missing piece in end-to-end machine learning
Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, Chaudhary V, Young M, Crespo J-F, Dennison D (2015) Hidden technical debt in machine learning systems. In: Advances in neural information processing systems, pp 2503–2511
Seedbank G (2019) A set of models shared via google colab. https://research.google.com/seedbank/
Seide F, Agarwal A (2016) Cntk: Microsoft’s open-source deep-learning toolkit. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 2135–2135
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van DenDriessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
SQL (2019) A c-language library to run sql engine. https://www.sqlite.org/index.html
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol 4, pp 12
Tantithamthavorn C, Hassan AE, Matsumoto K (2018) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng
Uber (2019) Imeet michelangelo: Uber’s machine learning platform. https://eng.uber.com/michelangelo/
VanRijn JN, Bischl B, Torgo L, Gao B, Umaashankar V, Fischer S, Winter P, Wiswedel B, Berthold MR, Vanschoren J (2013) Openml: A collaborative science platform. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp 645–649
Vartak M, Subramanyam H, Lee W-E, Viswanathan S, Husnoo S, Madden S, Zaharia M (2016) M odel db: a system for machine learning model management. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics. ACM, pp 14
Vartak M (2018a) Infrastructure for model management and model diagnosis. Ph.D. Thesis, Massachusetts Institute of Technology
Vartak M, Madden S (2018b) Modeldb: Opportunities and challenges in managing machine learning models. IEEE Data Eng Bull 41(4):16–25
Google Scholar
Velazquez M, Anantharaman R, Velazquez S, Lee Y (2019) Rnn-based alzheimer’s disease prediction from prodromal stage using diffusion tensor imaging. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp 1665–1672
Yu X, Sohn K, Chandraker M (2018) Video security system using a siamese reconstruction convolutional neural network for pose-invariant face recognition. US Patent App. 15/803,318
Zaharia M, Chen A, Davidson A, Ghodsi A, Hong SA, Konwinski A, Murching S, Nykodym T, Ogilvie P, Parkhe M et al (2018) Accelerating the machine learning lifecycle with mlflow. Data Engineering:39
Zhang A, Lipton ZC, Li M, Smola AssJ (2019) Dive into deep learning. http://www.d2l.ai

Download references

Acknowledgements

We would like to thank Sirisha Rella and Duy Ho for their help in some implementation parts in early versions of ModelKB. We would like to thank the Ph.D. students and the industry participants who helped in conducting the user study and evaluate the software system. We also thank the anonymous reviewers for their time and effort in reviewing this work. The first author thanks Yasmin Hussein for her help and support throughout this work. The coauthor, Yugyung Lee, would like to acknowledge the partial support of the NSF Grant No. 1747751

Author information

Authors and Affiliations

School of Computing and Engineering, University of Missouri-Kansas City, 5000 Holmes St, Kansas City, MO, 64110, USA
Gharib Gharibi, Vijay Walunj, Raju Nekadi, Raj Marri & Yugyung Lee

Authors

Gharib Gharibi
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Walunj
View author publications
You can also search for this author in PubMed Google Scholar
Raju Nekadi
View author publications
You can also search for this author in PubMed Google Scholar
Raj Marri
View author publications
You can also search for this author in PubMed Google Scholar
Yugyung Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gharib Gharibi.

Additional information

Communicated by: Tim Menzies, Chakkrit Tantithamthavorn and Burak Turhan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Software Engineering in the Age of Artificial Intelligence

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gharibi, G., Walunj, V., Nekadi, R. et al. Automated end-to-end management of the modeling lifecycle in deep learning. Empir Software Eng 26, 17 (2021). https://doi.org/10.1007/s10664-020-09894-9

Download citation

Accepted: 27 November 2020
Published: 19 February 2021
DOI: https://doi.org/10.1007/s10664-020-09894-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated end-to-end management of the modeling lifecycle in deep learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Task-Specific Automation in Deep Learning Processes

ART: Actually Robust Training

A Toolkit for Analysis of Deep Learning Experiments

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now