Abstract
This paper discusses the role of Machine Learning (ML) based algorithms and methods in Big Data Processing & Analytics (BDA). ML and BDA are both evolutionary fields of computing and the developments in these fields are complementing each other. The ever changing data landscape in modern digital world have resulted in newer ways of data processing frameworks in order to get meaningful insights which are unprecedented. This paper presents a detailed review on latest developments in ML algorithms for Big Data Processing. In later section key challenges associated with application of ML based approaches are also discussed. ML based Big Data Processing has gained popularity and new developments are on the rise for efficient data processing. This field is witnessing unparalleled emergence of new methods and approaches for efficient data processing in order to discover interestingness for decision making. Thus, more and more ML based data processing approaches are being used for Big Data Processing. With the splurge data from different newer sources, heterogeneous nature of data, uncertain & unstructured data, the so called Big Data with all its characteristics (5 Vs) there is an ever increasing need to use approaches which aid in modelling and processing of these data, provide automated approach to data processing and so on. These type of new processing requirements have given a big boost to the development of new ML based methods for managing & processing them. The paper will be useful to the scholars who are researching in this interesting & challenging domain of ML and Big Data Processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sandryhaila, A., Moura, J.M.: Big data analysis with signal processing on graphs: representation and processing of massive data sets with irregular structure. IEEE Signal Process. Mag. 31(5), 80–90 (2014)
Gantz, J., Reinsel, D.: Extracting value from chaos technical report white paper. International Data Corporation (IDC) Sponsored by EMC Corporation (2011)
Gantz, J., Reinsel, D.: The Digital Universe Decade - Are You Ready?. Basic Books, New York (2010)
Press, G.: 6 predictions for the $125 billion big data analytics market in 2015 (2014)
The evolution of big data, and where we’re headed — wired. https://www.wired.com/insights/2014/03/evolution-big-data-headed/. Accessed 10 June 2017
Inc., T.P.F.S.G.: The evolution of big data. https://content.pncmc.com/live/pnc/corporate/pncideas/articles/CIB_ENT_PDF_0815-066-196209-CIB_FPS_BigData_rev1.pdf. Accessed 10 June 2017
Hype cycle for big data (2014). https://www.gartner.com/doc/2814517/hype-cycle-big-data-. Accessed 10 June 2017
Hype cycle - wikipedia. https://en.wikipedia.org/wiki/Hype_cycle. Accessed 10 June 2017
Gartner hype cycle for emerging technologies: AI, AR/VR, digital platforms — what’s the big data? https://whatsthebigdata.com/2017/08/16/2017-gartner-hype-cycle-for-emerging-technologies-ai-arvr-digital-platforms/. Accessed 10 June 2017
What is the difference between artificial intelligence and machine learning? https://www.forbes.com/sites/bernardmarr/2016/12/06/what-is-the-difference-between-artificial-intelligence-and-machine-learning/2/#1f240102483d. Accessed 10 June 2017
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996)
Ingersoll, G.: Introducing apache mahout. IBM developer Works Technical Library (2009)
Mikut, R., Reischl, M.: Data mining tools. Wiley Interdisc. Rev. Data Mining Knowl. Discov. 1(5), 431–443 (2011)
Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: From big data to big impact. MIS Q. 36(4), 1165–1188 (2012)
Dietrich, D., Heller, B., Yang, B.: Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. Wiley, Hoboken (2015)
Chopra, A., Madan, S.: Big data: a trouble or a real solution? Int. J. Comput. Sci. Issues 12(2), 221 (2015)
Twardowski, B., Ryzko, D.: Multi-agent architecture for real-time big data processing. In: 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 3, pp. 333–337. IEEE (2014)
Amatriain, X.: Mining large streams of user data for personalized recommendations. ACM SIGKDD Explor. Newsl. 14(2), 37–48 (2013)
Richter, A.N., Khoshgoftaar, T.M., Landset, S., Hasanin, T.: A multi-dimensional comparison of toolkits for machine learning with big data. In: 2015 IEEE International Conference on Information Reuse and Integration (IRI), pp. 1–8. IEEE (2015)
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Agneeswaran, V.S., et al.: Big-data-theoretical, engineering and analytics perspective. In: BDA, pp. 8–15. Springer (2012)
Lehmann, D., Fekete, D., Vossen, G.: Technology selection for big data and analytical applications. Technical report, Working Papers, ERCIS-European Research Center for Information Systems (2016)
A short history of machine learning - every manager should read. https://www.forbes.com/sites/bernardmarr/2016/02/19/a-short-history-of-machine-learning-every-manager-should-read/2/#28d56abd6b1b. Accessed 10 June 2017
Qiu, J., Wu, Q., Ding, G., Xu, Y., Feng, S.: A survey of machine learning for big data processing. EURASIP J. Adv. Signal Process. 2016(1), 67 (2016)
Zheng, J., Shen, F., Fan, H., Zhao, J.: An online incremental learning support vector machine for large-scale data. Neural Comput. Appl. 22(5), 1023–1035 (2013)
Mitchell, T.M., et al.: Machine Learning. WCB/McGraw-Hill, USA (1997)
Ghosh, C., Cordeiro, C., Agrawal, D.P., Rao, M.B.: Markov chain existence and Hidden Markov models in spectrum sensing. In: 2009 IEEE International Conference on Pervasive Computing and Communications, PerCom 2009, pp. 1–6. IEEE (2009)
Yue, K., Fang, Q., Wang, X., Li, J., Liu, W.: A parallel and incremental approach for data-intensive learning of Bayesian networks. IEEE Trans. Cybern. 45(12), 2890–2904 (2015)
Dong, X., Li, Y., Wu, C., Cai, Y.: A learner based on neural network for cognitive radio. In: 2010 12th IEEE International Conference on Communication Technology (ICCT), pp. 893–896. IEEE (2010)
Safatly, L., Bkassiny, M., Al-Husseini, M., El-Hajj, A.: Cognitive radio transceivers: RF, spectrum sensing, and learning algorithms review. Int. J. Antennas Propag. 2014, 21 (2014)
Bkassiny, M., Jayaweera, S.K., Li, Y.: Multidimensional dirichlet process-based non-parametric signal classification for autonomous self-learning cognitive radios. IEEE Trans. Wirel. Commun. 12(11), 5413–5423 (2013)
Das, T.K., Gosavi, A., Mahadevan, S., Marchalleck, N.: Solving semi-markov decision problems using average reward reinforcement learning. Manag. Sci. 45(4), 560–574 (1999)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015)
Ryohei, F., Satoshi, M.: The most advanced data mining of the big data era. NEC Tech. J. 7(2), 91–95 (2012)
Jones, N.: The learning machines. Nature 505(7482), 146 (2014)
Langford, J.: Tutorial on practical prediction theory for classification. J. Mach. Learn. Res. 6(Mar), 273–306 (2005)
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res. 3(Mar), 1183–1208 (2003)
Zhou, L., Pan, S., Wang, J., Vasilakos, A.V.: Machine learning on big data: opportunities and challenges. Neurocomputing 237, 350–361 (2017)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Chen, Q., Zobel, J., Verspoor, K.: Evaluation of a machine learning duplicate detection method for bioinformatics databases. In: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics, pp. 4–12. ACM (2015)
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. ACM Trans. Knowl. Discov. Data 7(3), 10 (2013)
GarcÃa, S., RamÃrez-Gallego, S., Luengo, J., BenÃtez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Anal. 1(1), 9 (2016)
Cao, L., Wei, M., Yang, D., Rundensteiner, E.A.: Online outlier exploration over large datasets. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2015)
Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)
Cai, X., Nie, F., Huang, H.: Multi-view k-means clustering on big data. In: IJCAI, pp. 2598–2604 (2013)
RamÃrez-Gallego, S., GarcÃa, S., Mouriño-TalÃn, H., MartÃnez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A., BenÃtez, J.M., Herrera, F.: Data discretization: taxonomy and big data challenge. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 6(1), 5–21 (2016)
Zhang, Y., Cheung, Y.M.: Discretizing numerical attributes in decision tree for big data analysis. In: 2014 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 1150–1157. IEEE (2014)
Nguyen-Dinh, L.V., Rossi, M., Blanke, U., Tröster, G.: Combining crowd-generated media and personal data: semi-supervised learning for context recognition. In: Proceedings of the 1st ACM International Workshop on Personal Data Meets Distributed Multimedia, pp. 35–38. ACM (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Bhatnagar, R. (2018). Machine Learning and Big Data Processing: A Technological Perspective and Review. In: Hassanien, A., Tolba, M., Elhoseny, M., Mostafa, M. (eds) The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018). AMLTA 2018. Advances in Intelligent Systems and Computing, vol 723. Springer, Cham. https://doi.org/10.1007/978-3-319-74690-6_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-74690-6_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74689-0
Online ISBN: 978-3-319-74690-6
eBook Packages: EngineeringEngineering (R0)