Skip to main content

Advertisement

Log in

MLife: a lite framework for machine learning lifecycle initialization

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

Machine learning (ML) lifecycle is a cyclic process to build an efficient ML system. Though a lot of commercial and community (non-commercial) frameworks have been proposed to streamline the major stages in the ML lifecycle, they are normally overqualified and insufficient for an ML system in its nascent phase. Driven by real-world experience in building and maintaining ML systems, we find that it is more efficient to initialize the major stages of ML lifecycle first for trial and error, followed by the extension of specific stages to acclimatize towards more complex scenarios. For this, we introduce a simple yet flexible framework, MLife, for fast ML lifecycle initialization. This is built on the fact that data flow in MLife is in a closed loop driven by bad cases, especially those which impact ML model performance the most but also provide the most value for further ML model development—a key factor towards enabling enterprises to fast track their ML capabilities. Better yet, MLife is also flexible enough to be easily extensible to more complex scenarios for future maintenance. For this, we introduce two real-world use cases to demonstrate that MLife is particularly suitable for ML systems in their early phases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • 5Analytics. Retrieved from 08 May 2021. https://www.5analytics.com/

  • airflow. Retrieved from 08 May 2021. https://airflow.apache.org/

  • Algorithmia. Retrieved from 08 May 2021. https://algorithmia.com/

  • Amazon, (2020). Training ml models. In Amazon machine learning: Developer guide (pp. 72–73). Amazon Web Services.

  • Amazon web services. Retrieved from 08 May 2021. https://aws.amazon.com/

  • Ashmore, R., Calinescu, R., & Paterson, C. (2019). Assuring the machine learning lifecycle: Desiderata, methods, and challenges. arXiv preprint arXiv:1905.04223

  • Aslam, F. A., Mohammed, H. N., Mohd, J. M., Gulamgaus, M. A., & Lok, P. (2015). Efficient way of web development using python and flask. International Journal of Advanced Research in Computer Science, 6(2), 54.

    Google Scholar 

  • Baylor, D., Breck, E., Cheng, H. T., Fiedel, N., Foo, C. Y., Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L., & Koo, C. Y. (2017). Tfx: A tensorflow-based production-scale machine learning platform. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1387–1395).

  • Bengio, S. (2015). Sharing representations for long tail computer vision problems. In ACM on international conference on multimodal interaction (p. 1).

  • Bhosale, S., Patil, T., & Patil, P. (2015). Sqlite: Light database system. International Journal of Computer Science and Mobile Computing, 4(4), 882.

    Google Scholar 

  • Chen, C., Golshan, B., Halevy, A., Tan, W., & Doan, A. (2018). Biggorilla: An open-source ecosystem for data preparation and integration. IEEE Data Engineering Bulletin, 41(2), 10–22.

    Google Scholar 

  • Clobotics: Cloud image recognition. Retrieved from 08 May 2021. https://clobotics.com/retail

  • Cortex. Retrieved from 08 May 2021. https://www.cortex.dev/

  • craft ai. Retrieved from 08 May 2021. https://www.craft.ai/

  • Crankshaw, D., Wang, X., Zhou, G., Franklin, M., Gonzalez, J., & Stoica, I. (2017). Clipper: A low-latency online prediction serving system. In USENIX symposium on operating systems design and implementation (OSDI) (pp. 613–627).

  • Datatron. Retrieved from 08 May 2021. https://www.datatron.com/

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (pp. 248–255).

  • Engwall, K., & Roe, M. (2020). Git and GitLab in library website change management workflows. Code4Lib Journal, 48. https://journal.code4lib.org/articles/15250.

  • Fan, J., & Li, G. (2018). Human-in-the-loop rule learning for data integration. IEEE Data Engineering Bulletin, 41(2), 104–115.

    Google Scholar 

  • Fanelli, D., & Piazza, F. (2020). Analysis and forecast of covid-19 spreading in China, Italy and France. Chaos, Solitons & Fractals, 134, 109761.

    Article  MathSciNet  Google Scholar 

  • FBLearner. Retrieved from 08 May 2021. https://code.fb.com/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/

  • Flyte. Retrieved from 08 May 2021. https://lyft.github.io/flyte/

  • Horizon Robotics: Driver monitoring system. Retrieved from 08 May 2021. https://en.horizon.ai/product/nebula

  • JupyterHub. Retrieved from 08 May 2021. https://jupyter.org/hub

  • Khan, M. Q., & Lee, S. (2019). A comprehensive survey of driving monitoring and assistance systems. Sensors, 19(11), 2574.

    Article  Google Scholar 

  • KNIME. Retrieved from 08 May 2021. https://www.knime.com/

  • kubeflow. Retrieved from 08 May 2021. https://www.kubeflow.org/

  • Lee, D., Macke, S., Xin, D., Lee, A., Huang, S., & Parameswaran, A. (2019). A human-in-the-loop perspective on automl: Milestones and the road ahead. IEEE Data Engineering Bulletin, 42(2), 59–70.

    Google Scholar 

  • Lee, Y., Scolari, A., Chun, B., Santambrogio, M., Weimer, M., & Interlandi, M. (2018). Pretzel: Opening the black box of machine learning prediction serving systems. In USENIX symposium on operating systems design and implementation (OSDI) (pp. 611–626).

  • Lee, Y., Scolari, A., Chun, B., Weimer, M., & Interlandi, M. (2018). From the edge to the cloud: Model serving in ml.net. IEEE Data Engineering Bulletin, 41(4), 46–53.

    Google Scholar 

  • Li, S., & Deng, W. (2020). Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2020.2981446

    Article  Google Scholar 

  • Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11–26.

    Article  Google Scholar 

  • Miao, H., Li, A., Davis, L., & Deshpande, A. (2017). Modelhub: Deep learning lifecycle management. In International conference on data engineering (pp. 1393–1394).

  • Michelangelo. Retrieved from 08 May 2021. https://eng.uber.com/michelangelo/

  • Microsoft. Retrieved from 08 May 2021. https://docs.microsoft.com/en-us/azure/machine-learning/

  • Microsoft machine learning server. Retrieved from 08 May 2021. https://docs.microsoft.com/en-us/machine-learning-server

  • mlflow. Retrieved from 08 May 2021. https://mlflow.org/docs/

  • mxnet. Retrieved from 08 May 2021. https://mxnet.cdn.apache.org/

  • Mxnet model server (mms). Retrieved from 08 May 2021. https://github.com/awslabs/mxnet-model-server

  • NiFi. Retrieved from 08 May 2021. https://nifi.apache.org/

  • Olston, C., Li, F., Harmsen, J., Soyke, J., Gorovoy, K., Lao, L., Fiedel, N., Ramesh, S., & Rajashekhar, V. (2017). Tensorflow-serving: Flexible, high-performance ml serving. In Workshop on ML systems at NIPS 2017 (pp. 1–8).

  • Ortu, M., Destefanis, G., Kassab, M., Counsell, S., Marchesi, M., & Tonelli, R. (2015). Would you mind fixing this issue? In International conference on Agile software development (pp. 129–140). Springer.

  • Pan, J., & McElhannon, J. (2018). Future edge cloud and edge computing for internet of things applications. IEEE Internet of Things Journal, 5(1), 439–449.

    Article  Google Scholar 

  • Peltarion. Retrieved from 08 May 2021. https://peltarion.com/

  • Polyzotis, N., Roy, S., Whang, S., & Zinkevich, M. (2018). Data lifecycle challenges in production machine learning: A survey. ACM SIGMOD Record, 47(2), 17–28.

    Article  Google Scholar 

  • Pytorch. Retrieved from 08 May 2021. https://pytorch.org/

  • Raschka, S., & Mirjalili, V. (2019). Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing Ltd.

  • Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.

    Article  Google Scholar 

  • SageMaker. Retrieved from 08 May 2021. https://aws.amazon.com/cn/sagemaker/

  • SAS: Sas model manager. Retrieved from 08 May 2021. https://www.sas.com/en_us/software/model-manager.html

  • Sawaya, W., & Giauque, W. (1986). Production and operations management. Harcourt Brace Jovanovich.

  • Schelter, S., Bießmann, F., Januschowski, T., Salinas, D., Seufert, S., & Szarvas, G. (2018). On challenges in machine learning model management. IEEE Data Engineering Bulletin, 41(4), 5–15.

    Google Scholar 

  • Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J. F., & Dennison, D. (2015). Hidden technical debt in machine learning systems. In International conference on neural information processing systems (pp. 2503–2511).

  • Seldon. Retrieved from 08 May 2021. https://www.seldon.io/

  • Srinivasan, T., Sanabria, R., & Metze, F. (2019). Analyzing utility of visual context in multimodal speech recognition under noisy conditions. arXiv preprint arXiv:1907.00477

  • Tensorflow serving. Retrieved from 08 May 2021. https://www.tensorflow.org/serving

  • valohai. Retrieved from 08 May 2021. https://valohai.com/

  • Vartak, M., & Madden, S. (2018). Modeldb: Opportunities and challenges in managing machine learning models. IEEE Data Engineering Bulletin, 41(4), 16–25.

    Google Scholar 

  • Xu, H., Zhang, H., Han, K., Wang, Y., Peng, Y., & Li, X. (2019). Learning alignment for multimodal emotion recognition from speech. arXiv preprint arXiv:1909.05645

  • Zaharia, M., et al. (2018). Accelerating the machine learning lifecycle with mlflow. IEEE Data Engineering Bulletin, 41(4), 39–45.

    Google Scholar 

Download references

Acknowledgements

The work is supported by the funding from Clobotics and Horizon Robotics under the Research Program of Smart Retail and Driver Monitoring System, respectively, and in part by CREST R&D Grant T03C1-17, Malaysia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yipeng Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Editors: João Gama, Alípio Jorge, Salvador García.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, C., Wang, W., Zhang, Y. et al. MLife: a lite framework for machine learning lifecycle initialization. Mach Learn 110, 2993–3013 (2021). https://doi.org/10.1007/s10994-021-06052-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10994-021-06052-0

Keywords