Abstract
Driven by the current advances of machine learning in a wide range of application areas, the need for developing easy to use frameworks for instrumenting machine learning effectively for non data analytics experts as well as novices increased dramatically. Furthermore, building machine learning models in the context of Big Data environments still represents a great challenge. In the present article, those challenges are addressed by introducing a new generic framework for efficiently facilitating the training, testing, managing, storing and retrieving of machine learning models in the context of Big Data. The framework makes use of a powerful Big Data software stack platform, web technologies and a microservice architecture for a fully manageable and highly scalable solution. A highly configurable user interface hiding platform details from the user is introduced giving the user the ability to easily train, test and manage machine learning models. Moreover, the framework automatically indexes and characterizes models and allows flexible exploration of them in the visual interface. The performance and usability of the new framework is evaluated on state-of-the-arts machine learning algorithms: it is shown that executing, storing and retrieving machine learning models via the framework results in a well acceptable low overhead demonstrating that the framework can provide an efficient approach for facilitating machine learning in Big Data environments. It is also evaluated, how configuration options (e.g. caching of RDDs in Apache Spark) affect runtime performance. Furthermore, the evaluation provides indicators for when the utilization of distributed computing (i.e. parallel computation) based on Apache Spark on a cluster outperforms single computer execution of a machine learning model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vernon, V.: Implementing Domain-Driven Design, p. 612. Addision-Wesley, Upper Saddle River (2013)
Fielding, R.T.: Architectural Styles and the Design of Network-Based Software Architectures. AAI9980887. University of California, Irvine (2000)
Nielsen, J.: 10 usability heuristics for user interface design. Nielsen Norman Group 1, 1 (1995)
Sebastiani, F.: Machine learning in automated texT categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Padmanabhan, J., Johnson Premkumar, M.J.: Machine learning in automatic speech recognition: a survey. IETE Tech. Rev. 32, 1–12 (2015)
Kononenko, I.: Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23(1), 89–109 (2001)
Voyant, C., et al.: Machine learning methods for solar radiation forecasting: a review. Renew. Energy 105, 569–582 (2017)
Jurado, S., Nebot, A., Mugica, F., Avellana, N.: Hybrid methodologies for electricity load forecasting: entropy-based feature selection with machine learning and soft computing techniques. Energy 86, 276–291 (2015)
Gandomi, A., Haider, M.: Beyond the hype: Big Data concepts, methods and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)
Karun, A.K., Chitharanjan, K.: A review on Hadoop-HDFS infrastructure extensions. In: 2013 IEEE Conference on Information and Communication Technologies, pp. 132–137. IEEE (2013)
Nadareishvili, I., Mitra, R., McLarty, M., Amundsen, M.: Microservice Architecture: Aligning Principles, Practices and Culture. O’Reilly Media Inc. (2016)
Vartak, M., et al.: Model DB: a system for machine learning model management. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics, p. 14. ACM (2016)
Johanson, A., Flogel, S., Dullo, C., Hasselbring, W.: OceanTEA: exploring ocean-derived climate data using microservices (2016)
Brewer, R.S., Johnson, P.M.: WattDepot: an open source software ecosystem for enterprise-scale energy data collection, storage, analysis and visualization. In: 2010 First IEEE International Conference on Smart Grid Communications. 2010 1st IEEE International Conference on Smart Grid Communications (SmartGridComm), pp. 91–95, Gaithersburg, MD, USA. IEEE (2010)
Shrestha, C.: A web based user interface for machine learning analysis of health and education data (2016)
Schelter, S., Böse, J.-H., Kirschnick, J., Klein, T., Seufert, S.: Automatically tracking metadata and provenance of machine learning experiments (2017)
Obe, R.O., Hsu, L.S.: PostgreSQL: Up and Running: a Practical Guide to the Advanced Open Source Database. O’Reilly Media Inc. (2017)
Meng, X., et al.: MLlib: machine learning in Apache Spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
Zaharia, M., et al.: Accelerating the machine learning lifecycle with MLflow. IEEE Data Eng. Bull. 41(4), 39–45 (2018)
Chan, S., Stone, T., Szeto, K.P., Chan, K.H.: Predictionio: a distributed machine learning server for practical software development. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 2493–2496. ACM (2013)
TensorFlow Serving. https://www.tensorflow.org/serving. Accessed 4 Feb 2020
kubeflow. https://www.kubeflow.org/. Accessed 4 Feb 2020
Candel, A., Parmar, V., LeDell, E., Arora, A.: Deep Learning with H2O. H2O. AI Inc. (2016)
Borthakur, D.: The Hadoop distributed file system: architecture and design. In: Hadoop Project Website, vol. 11, p. 21.0 (2007)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), Incline Village, NV, USA, pp. 1–10. IEEE, May 2010
Vavilapalli, V.K., et al.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing - SOCC 2013. The 4th Annual Symposium, pp. 1–16. ACM Press, Santa Clara (2013)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Microservices. https://martinfowler.com/articles/microservices.html. Accessed 18 Feb 2020
Newman, S.: Building Microservices: Designing Fine-Grained Systems, 1st edn. O’Reilly Media, Beijing (2015)
Coughlin, K., Piette, M., Goldman, C., Kiliccote, S.: Estimating demand response load impacts: evaluation of base line load models for non-residential buildings in California. Technical report, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA, USA (2008)
Khotanzad, A., Afkhami-Rohani, R., Lu, T.L., Abaye, A., Davis, M., Maratukulam, D.J.: ANNSTLF-a neural-network based electric load forecasting system. IEEE Trans. Neural Netw. 8(4), 835–846 (1997)
Evans, E.: Domain-Driven Design: Tackling Complexity in the Heart of Software, p. 529. Addison-Wesley, Boston (2004)
Shoeb, A.H., Guttag, J.V.: Application of machine learning to epileptic seizure detection. In: ICML (2010)
Shahoud, S., Gunnarsdottir, S., Khalloof, H., Duepmeier, C., Hagenmeyer, V.: Facilitating and managing machine learning and data analysis tasks in Big Data environments using web and microservice technologies. In: Proceedings of the 11th International Conference on Management of Digital EcoSystems, pp. 80–87 (2019)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2016)
Aman, S., Simmhan, Y., Prasanna, V.K.: Improving energy use forecast for campus micro-grids using indirect indicators. In: 2011 IEEE 11th International Conference on Data Mining Workshops. IEEE, pp. 389–397 (2011)
Hong, T., Gui, M., Baran, M., Willis, H.: Modeling and forecasting hourly electric load by multiple linear regression with interactions. In: IEEE PES General Meeting. IEEE, pp. 1–8 (2010)
Metaxiotis, K., Kagiannas, A., Askounis, D., Psarras, J.: Artificial intelligence in short term electric load forecasting. Energy Convers. Manag. 44(9), 1525–1534 (2003)
Mori, H., Takahashi, A.: Hybrid intelligent method of relevant vector machine and regression tree for probabilistic load forecasting. In: 2011 2nd IEEE PES International Conference and Exhibition on Innovative Smart Grid Technologies, pp. 1–8. IEEE (2011)
Cui, C., Wu, T., Hu, M., Weir, J.D., Li, X.: Short-term building energy model recommendation system: a meta-learning approach. Appl. Energy 172(2016), 251–263 (2016)
Mitchell, T.M.: Machine Learning. McGraw-Hill Series in Computer Science, 414 pp. McGraw-Hill, New York (1997)
Cruz, J.A., Wishart, D.S.: Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2, 59–77 (2006)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Machine Learning Library (MLlib) Guide. https://spark.apache.org/docs/latest/ml-guide.html. Accessed 19 Feb 2020
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning, vol. 12, pp. 194–202 (1995)
Hahne, F., Huber, W., Gentleman, R., Falcon, S.: Bioconductor Case Studies. Springer, New York (2010). https://doi.org/10.1007/978-0-387-77240-0
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2009). (Chapelle, O. et al. (eds.) (2006)) (bibbook reviews)
Kaelbling, L., Littman, M., Moore, A.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Mikowski, M., Powell, J.: Single Page Web Applications: JavaScript End-to-End. Manning Publications Co. (2013)
Kuan, J.: Learning Highcharts. Packt Publishing Ltd. (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Shahoud, S., Gunnarsdottir, S., Khalloof, H., Duepmeier, C., Hagenmeyer, V. (2020). Facilitating and Managing Machine Learning and Data Analysis Tasks in Big Data Environments Using Web and Microservice Technologies. In: Hameurlain, A., et al. Transactions on Large-Scale Data- and Knowledge-Centered Systems XLV. Lecture Notes in Computer Science(), vol 12390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-62308-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-662-62308-4_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-62307-7
Online ISBN: 978-3-662-62308-4
eBook Packages: Computer ScienceComputer Science (R0)