Skip to main content

Machine Learning and Big Data Processing: A Technological Perspective and Review

  • Conference paper
  • First Online:
The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018) (AMLTA 2018)

Abstract

This paper discusses the role of Machine Learning (ML) based algorithms and methods in Big Data Processing & Analytics (BDA). ML and BDA are both evolutionary fields of computing and the developments in these fields are complementing each other. The ever changing data landscape in modern digital world have resulted in newer ways of data processing frameworks in order to get meaningful insights which are unprecedented. This paper presents a detailed review on latest developments in ML algorithms for Big Data Processing. In later section key challenges associated with application of ML based approaches are also discussed. ML based Big Data Processing has gained popularity and new developments are on the rise for efficient data processing. This field is witnessing unparalleled emergence of new methods and approaches for efficient data processing in order to discover interestingness for decision making. Thus, more and more ML based data processing approaches are being used for Big Data Processing. With the splurge data from different newer sources, heterogeneous nature of data, uncertain & unstructured data, the so called Big Data with all its characteristics (5 Vs) there is an ever increasing need to use approaches which aid in modelling and processing of these data, provide automated approach to data processing and so on. These type of new processing requirements have given a big boost to the development of new ML based methods for managing & processing them. The paper will be useful to the scholars who are researching in this interesting & challenging domain of ML and Big Data Processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sandryhaila, A., Moura, J.M.: Big data analysis with signal processing on graphs: representation and processing of massive data sets with irregular structure. IEEE Signal Process. Mag. 31(5), 80–90 (2014)

    Article  Google Scholar 

  2. Gantz, J., Reinsel, D.: Extracting value from chaos technical report white paper. International Data Corporation (IDC) Sponsored by EMC Corporation (2011)

    Google Scholar 

  3. Gantz, J., Reinsel, D.: The Digital Universe Decade - Are You Ready?. Basic Books, New York (2010)

    Google Scholar 

  4. Press, G.: 6 predictions for the $125 billion big data analytics market in 2015 (2014)

    Google Scholar 

  5. The evolution of big data, and where we’re headed — wired. https://www.wired.com/insights/2014/03/evolution-big-data-headed/. Accessed 10 June 2017

  6. Inc., T.P.F.S.G.: The evolution of big data. https://content.pncmc.com/live/pnc/corporate/pncideas/articles/CIB_ENT_PDF_0815-066-196209-CIB_FPS_BigData_rev1.pdf. Accessed 10 June 2017

  7. Hype cycle for big data (2014). https://www.gartner.com/doc/2814517/hype-cycle-big-data-. Accessed 10 June 2017

  8. Hype cycle - wikipedia. https://en.wikipedia.org/wiki/Hype_cycle. Accessed 10 June 2017

  9. Gartner hype cycle for emerging technologies: AI, AR/VR, digital platforms — what’s the big data? https://whatsthebigdata.com/2017/08/16/2017-gartner-hype-cycle-for-emerging-technologies-ai-arvr-digital-platforms/. Accessed 10 June 2017

  10. What is the difference between artificial intelligence and machine learning? https://www.forbes.com/sites/bernardmarr/2016/12/06/what-is-the-difference-between-artificial-intelligence-and-machine-learning/2/#1f240102483d. Accessed 10 June 2017

  11. Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  12. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996)

    Google Scholar 

  13. Ingersoll, G.: Introducing apache mahout. IBM developer Works Technical Library (2009)

    Google Scholar 

  14. Mikut, R., Reischl, M.: Data mining tools. Wiley Interdisc. Rev. Data Mining Knowl. Discov. 1(5), 431–443 (2011)

    Article  Google Scholar 

  15. Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: From big data to big impact. MIS Q. 36(4), 1165–1188 (2012)

    Google Scholar 

  16. Dietrich, D., Heller, B., Yang, B.: Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. Wiley, Hoboken (2015)

    Google Scholar 

  17. Chopra, A., Madan, S.: Big data: a trouble or a real solution? Int. J. Comput. Sci. Issues 12(2), 221 (2015)

    Google Scholar 

  18. Twardowski, B., Ryzko, D.: Multi-agent architecture for real-time big data processing. In: 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 3, pp. 333–337. IEEE (2014)

    Google Scholar 

  19. Amatriain, X.: Mining large streams of user data for personalized recommendations. ACM SIGKDD Explor. Newsl. 14(2), 37–48 (2013)

    Article  Google Scholar 

  20. Richter, A.N., Khoshgoftaar, T.M., Landset, S., Hasanin, T.: A multi-dimensional comparison of toolkits for machine learning with big data. In: 2015 IEEE International Conference on Information Reuse and Integration (IRI), pp. 1–8. IEEE (2015)

    Google Scholar 

  21. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987)

    Article  Google Scholar 

  22. Agneeswaran, V.S., et al.: Big-data-theoretical, engineering and analytics perspective. In: BDA, pp. 8–15. Springer (2012)

    Google Scholar 

  23. Lehmann, D., Fekete, D., Vossen, G.: Technology selection for big data and analytical applications. Technical report, Working Papers, ERCIS-European Research Center for Information Systems (2016)

    Google Scholar 

  24. A short history of machine learning - every manager should read. https://www.forbes.com/sites/bernardmarr/2016/02/19/a-short-history-of-machine-learning-every-manager-should-read/2/#28d56abd6b1b. Accessed 10 June 2017

  25. Qiu, J., Wu, Q., Ding, G., Xu, Y., Feng, S.: A survey of machine learning for big data processing. EURASIP J. Adv. Signal Process. 2016(1), 67 (2016)

    Article  Google Scholar 

  26. Zheng, J., Shen, F., Fan, H., Zhao, J.: An online incremental learning support vector machine for large-scale data. Neural Comput. Appl. 22(5), 1023–1035 (2013)

    Article  Google Scholar 

  27. Mitchell, T.M., et al.: Machine Learning. WCB/McGraw-Hill, USA (1997)

    MATH  Google Scholar 

  28. Ghosh, C., Cordeiro, C., Agrawal, D.P., Rao, M.B.: Markov chain existence and Hidden Markov models in spectrum sensing. In: 2009 IEEE International Conference on Pervasive Computing and Communications, PerCom 2009, pp. 1–6. IEEE (2009)

    Google Scholar 

  29. Yue, K., Fang, Q., Wang, X., Li, J., Liu, W.: A parallel and incremental approach for data-intensive learning of Bayesian networks. IEEE Trans. Cybern. 45(12), 2890–2904 (2015)

    Article  Google Scholar 

  30. Dong, X., Li, Y., Wu, C., Cai, Y.: A learner based on neural network for cognitive radio. In: 2010 12th IEEE International Conference on Communication Technology (ICCT), pp. 893–896. IEEE (2010)

    Google Scholar 

  31. Safatly, L., Bkassiny, M., Al-Husseini, M., El-Hajj, A.: Cognitive radio transceivers: RF, spectrum sensing, and learning algorithms review. Int. J. Antennas Propag. 2014, 21 (2014)

    Article  Google Scholar 

  32. Bkassiny, M., Jayaweera, S.K., Li, Y.: Multidimensional dirichlet process-based non-parametric signal classification for autonomous self-learning cognitive radios. IEEE Trans. Wirel. Commun. 12(11), 5413–5423 (2013)

    Article  Google Scholar 

  33. Das, T.K., Gosavi, A., Mahadevan, S., Marchalleck, N.: Solving semi-markov decision problems using average reward reinforcement learning. Manag. Sci. 45(4), 560–574 (1999)

    Article  MATH  Google Scholar 

  34. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Google Scholar 

  35. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015)

    Article  Google Scholar 

  36. Ryohei, F., Satoshi, M.: The most advanced data mining of the big data era. NEC Tech. J. 7(2), 91–95 (2012)

    Google Scholar 

  37. Jones, N.: The learning machines. Nature 505(7482), 146 (2014)

    Article  Google Scholar 

  38. Langford, J.: Tutorial on practical prediction theory for classification. J. Mach. Learn. Res. 6(Mar), 273–306 (2005)

    MathSciNet  MATH  Google Scholar 

  39. Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res. 3(Mar), 1183–1208 (2003)

    MATH  Google Scholar 

  40. Zhou, L., Pan, S., Wang, J., Vasilakos, A.V.: Machine learning on big data: opportunities and challenges. Neurocomputing 237, 350–361 (2017)

    Article  Google Scholar 

  41. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  42. Chen, Q., Zobel, J., Verspoor, K.: Evaluation of a machine learning duplicate detection method for bioinformatics databases. In: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics, pp. 4–12. ACM (2015)

    Google Scholar 

  43. Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. ACM Trans. Knowl. Discov. Data 7(3), 10 (2013)

    Article  Google Scholar 

  44. García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Anal. 1(1), 9 (2016)

    Article  Google Scholar 

  45. Cao, L., Wei, M., Yang, D., Rundensteiner, E.A.: Online outlier exploration over large datasets. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2015)

    Google Scholar 

  46. Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)

    Article  Google Scholar 

  47. Cai, X., Nie, F., Huang, H.: Multi-view k-means clustering on big data. In: IJCAI, pp. 2598–2604 (2013)

    Google Scholar 

  48. Ramírez-Gallego, S., García, S., Mouriño-Talín, H., Martínez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: Data discretization: taxonomy and big data challenge. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 6(1), 5–21 (2016)

    Article  Google Scholar 

  49. Zhang, Y., Cheung, Y.M.: Discretizing numerical attributes in decision tree for big data analysis. In: 2014 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 1150–1157. IEEE (2014)

    Google Scholar 

  50. Nguyen-Dinh, L.V., Rossi, M., Blanke, U., Tröster, G.: Combining crowd-generated media and personal data: semi-supervised learning for context recognition. In: Proceedings of the 1st ACM International Workshop on Personal Data Meets Distributed Multimedia, pp. 35–38. ACM (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roheet Bhatnagar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhatnagar, R. (2018). Machine Learning and Big Data Processing: A Technological Perspective and Review. In: Hassanien, A., Tolba, M., Elhoseny, M., Mostafa, M. (eds) The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018). AMLTA 2018. Advances in Intelligent Systems and Computing, vol 723. Springer, Cham. https://doi.org/10.1007/978-3-319-74690-6_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-74690-6_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-74689-0

  • Online ISBN: 978-3-319-74690-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics