ABSTRACT
In an era where businesses continuously try to conquer new markets and at-tract new customers, Big Data usage is becoming a strategic enabler to sup-port such ambitions. The upward trend of enterprises offering omnichannel experiences to customers and prospects gives rise to vast amounts of non-traditional data sources, which need to be leveraged to turn them into action-able insights. Enterprises are becoming aware of these challenges and are trying to integrate their existing corporate data with non-traditional acquired Big Data to unlock meaningful use cases. The IT landscape has coincided with these trends, offering modern technology stacks in infrastructure, Big Data platforms, and artificial intelligence tools and frameworks. However, the path toward data analytics in the enterprise concerning business requirements, time to market, and industrialization constraints remain a real challenge. Implementing a data analytics use case efficiently requires heterogeneous steps ranging from data preparation, model training, validation, serving, and managing the product lifecycle to guarantee sustainability and ultimately reach the expected business outcomes. Therefore, the enterprises’ data ecosystem should support these steps by providing a suitable platform enabling data scientists’ collaboration with tools to unleash data analytics at scale. This paper addresses these needs and suggests a scalable cloud-ready architecture that fits into data science pipelines with all their steps, specificities, and requirements. The propositions made through this contribution are driven by business ambitions and confronted with a benchmark that presents the available solutions in the IT marketplace with careful consideration of their strengths and weaknesses. Finally, the suggested architecture is exposed and discussed with perspectives and future research areas.
- Ravi Bhalla, 2014. The omnichannel customer experience: Driving engagement through digitization : Journal of Digital & Social Media MarketingGoogle Scholar
- https://www.forbes.com/sites/louiscolumbus/2018/12/23/big-data-analytics-adoption-soared-in-the-enterprise-in-2018, accessed on 02/05/2023Google Scholar
- https://online.hbs.edu/blog/post/types-of-data-analysis, accessed on 02/05/2023Google Scholar
- Youssef Gahi, Imane El Alaoui, 2020. Machine Learning and Deep Learning Models for Big Data Issues : Machine Intelligence and Big Data Analytics for Cybersecurity Applications, Studies in Computational Intelligence 919Google Scholar
- https://www.gartner.com/doc/reprints?id=1-2AL8UJBJ&ct=220715&st=sb, accessed on 05/02/2022Google Scholar
- Yang Zhang, Fangzhou Xu, Erwin Frise, Wu Siqi, 2016. DataLab: A Version Data Management and Analytics System : The 2nd International WorkshopGoogle Scholar
- Hélio Castro, Filipe Costaa, Luís Ferreirac, Paulo Ávilaa, b, Goran D. Putnikd, Manuela Cruz-Cunha, 2022. Data Science for Industry 4.0: A Literature Review on Open Design Approach : International Conference on Industry Sciences and Computer Science InnovationGoogle Scholar
- Paolo Spagnoletti, Niloofar Kazemargi, 2021. Agile Practices and Organizational Agility in Software Ecosystems : IEEE Transactions on Engineering ManagementGoogle Scholar
- Youssef Gahi, Imane EL Alaoui, 2019. A Secure Multi-User Database-as-a-Service Approach for Cloud Computing Privacy : International Workshop on Emerging Networks and CommunicationsGoogle Scholar
- Mali Senapathi, Jim Buchan, Hady Osman, 2018. DevOps Capabilities, Practices, and Challenges: Insights from a Case Study : International Conference on Evaluation and Assessment in Software EngineeringGoogle Scholar
- Claus Pahl, Jacopo Soldani, Pooyan Jamshidi, 2017. Cloud Container Technologies: A State-of-the-Art Review : IEEE Transactions on Cloud ComputingGoogle Scholar
- Richard Bullington-McGuire, Andrew K. Dennis, Michael Schwartz, 2020. Docker for Developers: Develop and run your application with Docker containers using DevOps tools for continuous delivery : Packt PublishingGoogle Scholar
- https://www.docker.com/resources/what-container/, accessed on 02/05/2023Google Scholar
- https://www.kubeflow.org/docs/started/architecture/, accessed on 02/05/2023Google Scholar
- Gahi Youssef, Abou Zakaria Faroukhi, Imane El alaoui, Aouatif Amine, 2020. Big data monetization throughout Big Data Value Chain: a comprehensive review : Journal of Big DataGoogle Scholar
- Data Science and Machine Learning Platforms review,2021. GartnerGoogle Scholar
- Josh Patterson, Michael Katzenellenbogen, Austin Harris, 2020. KubeFlow operations guide : O'Reilly MediaGoogle Scholar
- Trevor Grant, Holden Karau, Boris Lublinsky, Richard Liu, Ilan Filonenko, 2020. KubeFlow for machine mearning: From lab to production”, O'Reilly MediaGoogle Scholar
- The state of kubernetes ecosystem – second edition, 2021 : The New StackGoogle Scholar
- https://www.redhat.com/en/topics/containers/what-is-enterprise-kubernetes, accessed on 02/05/2023Google Scholar
- A Lossent, A Rodriguez Peon and A Wagner, 2017. PaaS for web applications with OpenShift Origin : Journal of Physics: Conference SeriesGoogle ScholarCross Ref
- Dejan Golubovic, Ricardo Rocha, 2021. Training and Serving ML workloads with Kubeflow at CERN : EPJ Web of Conferences 251Google Scholar
- Romeo Kienzler, Holger Kyas, 2019. Tensorflow 2.0 and KubeFlow for scalable and reproducable enterprise AI : 7th International Conference on Artificial Intelligence and ApplicationsGoogle Scholar
- Joshua Wood, Brian Tannous, 2021. OpenShift for Developers: A Guide for Impatient Beginners - Second edition : O'Reilly MediaGoogle Scholar
Index Terms
- A private cloud-based Datalab for scalable DSML pipelines
Recommendations
Big data analytics in Cloud computing: an overview
AbstractBig Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable ...
DataLab: a version data management and analytics system
BIGDSE '16: Proceedings of the 2nd International Workshop on BIG Data Software EngineeringOne challenge in big data analytics is the lack of tools to manage the complex interactions among code, data and parameters, especially in the common situation where all these factors can change a lot. We present our preliminary experience with DataLab, ...
Towards Cloud-Based Analytics-as-a-Service (CLAaaS) for Big Data Analytics in the Cloud
BIGDATACONGRESS '13: Proceedings of the 2013 IEEE International Congress on Big DataData Analytics has proven its importance in knowledge discovery and decision support in different data and application domains. Big data analytics poses a serious challenge in terms of the necessary hardware and software resources. The cloud technology ...
Comments