skip to main content
10.1145/3607720.3607744acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnissConference Proceedingsconference-collections
research-article

A private cloud-based Datalab for scalable DSML pipelines

Published:13 November 2023Publication History

ABSTRACT

In an era where businesses continuously try to conquer new markets and at-tract new customers, Big Data usage is becoming a strategic enabler to sup-port such ambitions. The upward trend of enterprises offering omnichannel experiences to customers and prospects gives rise to vast amounts of non-traditional data sources, which need to be leveraged to turn them into action-able insights. Enterprises are becoming aware of these challenges and are trying to integrate their existing corporate data with non-traditional acquired Big Data to unlock meaningful use cases. The IT landscape has coincided with these trends, offering modern technology stacks in infrastructure, Big Data platforms, and artificial intelligence tools and frameworks. However, the path toward data analytics in the enterprise concerning business requirements, time to market, and industrialization constraints remain a real challenge. Implementing a data analytics use case efficiently requires heterogeneous steps ranging from data preparation, model training, validation, serving, and managing the product lifecycle to guarantee sustainability and ultimately reach the expected business outcomes. Therefore, the enterprises’ data ecosystem should support these steps by providing a suitable platform enabling data scientists’ collaboration with tools to unleash data analytics at scale. This paper addresses these needs and suggests a scalable cloud-ready architecture that fits into data science pipelines with all their steps, specificities, and requirements. The propositions made through this contribution are driven by business ambitions and confronted with a benchmark that presents the available solutions in the IT marketplace with careful consideration of their strengths and weaknesses. Finally, the suggested architecture is exposed and discussed with perspectives and future research areas.

References

  1. Ravi Bhalla, 2014. The omnichannel customer experience: Driving engagement through digitization : Journal of Digital & Social Media MarketingGoogle ScholarGoogle Scholar
  2. https://www.forbes.com/sites/louiscolumbus/2018/12/23/big-data-analytics-adoption-soared-in-the-enterprise-in-2018, accessed on 02/05/2023Google ScholarGoogle Scholar
  3. https://online.hbs.edu/blog/post/types-of-data-analysis, accessed on 02/05/2023Google ScholarGoogle Scholar
  4. Youssef Gahi, Imane El Alaoui, 2020. Machine Learning and Deep Learning Models for Big Data Issues : Machine Intelligence and Big Data Analytics for Cybersecurity Applications, Studies in Computational Intelligence 919Google ScholarGoogle Scholar
  5. https://www.gartner.com/doc/reprints?id=1-2AL8UJBJ&ct=220715&st=sb, accessed on 05/02/2022Google ScholarGoogle Scholar
  6. Yang Zhang, Fangzhou Xu, Erwin Frise, Wu Siqi, 2016. DataLab: A Version Data Management and Analytics System : The 2nd International WorkshopGoogle ScholarGoogle Scholar
  7. Hélio Castro, Filipe Costaa, Luís Ferreirac, Paulo Ávilaa, b, Goran D. Putnikd, Manuela Cruz-Cunha, 2022. Data Science for Industry 4.0: A Literature Review on Open Design Approach : International Conference on Industry Sciences and Computer Science InnovationGoogle ScholarGoogle Scholar
  8. Paolo Spagnoletti, Niloofar Kazemargi, 2021. Agile Practices and Organizational Agility in Software Ecosystems : IEEE Transactions on Engineering ManagementGoogle ScholarGoogle Scholar
  9. Youssef Gahi, Imane EL Alaoui, 2019. A Secure Multi-User Database-as-a-Service Approach for Cloud Computing Privacy : International Workshop on Emerging Networks and CommunicationsGoogle ScholarGoogle Scholar
  10. Mali Senapathi, Jim Buchan, Hady Osman, 2018. DevOps Capabilities, Practices, and Challenges: Insights from a Case Study : International Conference on Evaluation and Assessment in Software EngineeringGoogle ScholarGoogle Scholar
  11. Claus Pahl, Jacopo Soldani, Pooyan Jamshidi, 2017. Cloud Container Technologies: A State-of-the-Art Review : IEEE Transactions on Cloud ComputingGoogle ScholarGoogle Scholar
  12. Richard Bullington-McGuire, Andrew K. Dennis, Michael Schwartz, 2020. Docker for Developers: Develop and run your application with Docker containers using DevOps tools for continuous delivery : Packt PublishingGoogle ScholarGoogle Scholar
  13. https://www.docker.com/resources/what-container/, accessed on 02/05/2023Google ScholarGoogle Scholar
  14. https://www.kubeflow.org/docs/started/architecture/, accessed on 02/05/2023Google ScholarGoogle Scholar
  15. Gahi Youssef, Abou Zakaria Faroukhi, Imane El alaoui, Aouatif Amine, 2020. Big data monetization throughout Big Data Value Chain: a comprehensive review : Journal of Big DataGoogle ScholarGoogle Scholar
  16. Data Science and Machine Learning Platforms review,2021. GartnerGoogle ScholarGoogle Scholar
  17. Josh Patterson, Michael Katzenellenbogen, Austin Harris, 2020. KubeFlow operations guide : O'Reilly MediaGoogle ScholarGoogle Scholar
  18. Trevor Grant, Holden Karau, Boris Lublinsky, Richard Liu, Ilan Filonenko, 2020. KubeFlow for machine mearning: From lab to production”, O'Reilly MediaGoogle ScholarGoogle Scholar
  19. The state of kubernetes ecosystem – second edition, 2021 : The New StackGoogle ScholarGoogle Scholar
  20. https://www.redhat.com/en/topics/containers/what-is-enterprise-kubernetes, accessed on 02/05/2023Google ScholarGoogle Scholar
  21. A Lossent, A Rodriguez Peon and A Wagner, 2017. PaaS for web applications with OpenShift Origin : Journal of Physics: Conference SeriesGoogle ScholarGoogle ScholarCross RefCross Ref
  22. Dejan Golubovic, Ricardo Rocha, 2021. Training and Serving ML workloads with Kubeflow at CERN : EPJ Web of Conferences 251Google ScholarGoogle Scholar
  23. Romeo Kienzler, Holger Kyas, 2019. Tensorflow 2.0 and KubeFlow for scalable and reproducable enterprise AI : 7th International Conference on Artificial Intelligence and ApplicationsGoogle ScholarGoogle Scholar
  24. Joshua Wood, Brian Tannous, 2021. OpenShift for Developers: A Guide for Impatient Beginners - Second edition : O'Reilly MediaGoogle ScholarGoogle Scholar

Index Terms

  1. A private cloud-based Datalab for scalable DSML pipelines
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          NISS '23: Proceedings of the 6th International Conference on Networking, Intelligent Systems & Security
          May 2023
          451 pages
          ISBN:9798400700194
          DOI:10.1145/3607720

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 November 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)20
          • Downloads (Last 6 weeks)4

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format