ABSTRACT
Driven by the great advance of machine learning in a wide range of application areas, the need for developing machine learning frameworks effectively as well as easily usable by novices increased dramatically. Furthermore, building machine learning models in the context of big data environments still represents a great challenge. In the present paper, we tackle these challenges by introducing a new generic framework for efficiently facilitating the training, testing, managing, storing, and retrieving of machine learning models in the context of big data. The framework makes use of a powerful big data software stack and a microservice architecture for a fully manageable and highly scalable solution. A highly configurable user interface is introduced giving the user the ability to easily train, test, and manage machine learning models. Moreover, it automatically indexes models and allows flexible exploration of them in the visual interface. The performance of the new framework is evaluated on state-of-the-arts machine learning algorithms: it is shown that storing and retrieving machine learning models as well as a respective acceptable low overhead demonstrate an efficient approach to facilitate machine learning in big data environments.
- Arno Candel, Viraj Parmar, Erin LeDell, and Anisha Arora. 2016. Deep Learning with H2O. H2O. ai Inc (2016).Google Scholar
- Simon Chan, Thomas Stone, Kit Pang Szeto, and Ka Hou Chan. 2013. Prediction IO: a distributed machine learning server for practical software development. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2493--2496.Google Scholar
- cloud.google.com/automl. April 7, 2019. AutoML. Retrieved April 7, 2019 from https://cloud.google.com/automl/Google Scholar
- Amir Gandomi and Murtaza Haider. 2015. Beyond the hype: Big Data concepts, methods, and analytics. International Journal of Information Management 35, 2 (2015), 137--144.Google ScholarDigital Library
- Arne Johanson, Sascha Flögel, Christian Dullo, and Wilhelm Hasselbring. 2016. Oceantea: exploring ocean-derived climate data using microservices. (2016).Google Scholar
- Sergio Jurado, Àngela Nebot, Fransisco Mugica, and Narcís Avellana. 2015. Hybrid methodologies for electricity load forecasting: Entropy-based feature selection with machine learning and soft computing techniques. Energy 86 (2015), 276--291.Google ScholarCross Ref
- A Kala Karun and K Chitharanjan. 2013. A review on hadoop--HDFS infrastructure extensions. In 2013 IEEE conference on information & communication technologies. IEEE, 132--137.Google ScholarCross Ref
- Igor Kononenko. 2001. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine 23, 1 (2001), 89--109.Google Scholar
- Joseph Kuan. 2012. Learning Highcharts. Packt Publishing Ltd.Google Scholar
- kubeflow.org. April 7, 2019. Kubeflow. Retrieved April 7, 2019 from https://www.kubeflow.orgGoogle Scholar
- Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkatara-man, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, et al. 2016. Mllib: Machine learning in Apache Spark. The Journal of Machine Learning Research 17, 1 (2016), 1235--1241.Google ScholarDigital Library
- mlflow.org. April 7, 2019. MLflow. Retrieved April 7, 2019 from https://mlflow.org/docs/latest/index.htmlGoogle Scholar
- Irakli Nadareishvili, Ronnie Mitra, Matt McLarty, and Mike Amundsen. 2016. Microservice architecture: aligning principles, practices, and culture. O'Reilly Media, Inc.Google Scholar
- Jakob Nielsen. 1995. 10 usability heuristics for user interface design. Nielsen Norman Group 1, 1 (1995).Google Scholar
- Regina O Obe and Leo S Hsu. 2017. PostgreSQL: Up and Running: a Practical Guide to the Advanced Open Source Database. O'Reilly Media, Inc.Google Scholar
- Jayashree Padmanabhan and Melvin Jose Johnson Premkumar. 2015. Machine Learning in Automatic Speech Recognition: A Survey. IETE Technical Review 32 (02 2015), 1--12. https://doi.org/10.1080/02564602.2015.1010611Google Scholar
- Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM computing surveys (CSUR) 34, 1 (2002), 1--47.Google Scholar
- Chandani Shrestha. 2016. A Web Based User Interface for Machine Learning Analysis of Health and Education Data. (2016).Google Scholar
- tensorflow.org. 2019. Tensorflow Serving. Retrieved April 7, 2019 from https://www.tensorflow.org/tfx/guide/servingGoogle Scholar
- Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel Madden, and Matei Zaharia. 2016. Model DB: a system for machine learning model management. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. ACM, 14.Google Scholar
- Cyril Voyant, Gilles Notton, Soteris Kalogirou, Marie-Laure Nivet, Christophe Paoli, Fabrice Motte, and Alexis Fouilloy. 2017. Machine learning methods for solar radiation forecasting: A review. Renewable Energy 105 (2017), 569--582.Google ScholarCross Ref
Index Terms
- Facilitating and Managing Machine Learning and Data Analysis Tasks in Big Data Environments using Web and Microservice Technologies
Recommendations
A Meta Learning Approach for Automating Model Selection in Big Data Environments using Microservice and Container Virtualization Technologies
MEDES '20: Proceedings of the 12th International Conference on Management of Digital EcoSystemsFor a given specific machine learning task, very often several machine learning algorithms and their right configurations are tested in a trial-and-error approach, until an adequate solution is found. This wastes human resources for constructing ...
Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case
SITA'18: Proceedings of the 12th International Conference on Intelligent Systems: Theories and ApplicationsMachine learning is a field within artificial intelligence that allows machines to learn on their own from existing information to make predictions or/and decisions. There are three main categories of machine learning techniques: Collaborative filtering ...
Machine learning on big data
Machine learning (ML) is continuously unleashing its power in a wide range of applications. It has been pushed to the forefront in recent years partly owing to the advent of big data. ML algorithms have never been better promised while challenged by big ...
Comments