skip to main content
10.1145/2791405.2791549acmotherconferencesArticle/Chapter ViewAbstractPublication PageswciConference Proceedingsconference-collections
research-article

A Theoretical Model for Big Data Analytics using Machine Learning Algorithms

Published: 10 August 2015 Publication History

Abstract

Big Data processing is currently becoming increasingly important in modern era due to continuous growth of the amount of data generated in various fields. Architecture for Big Data usually ranges across multiple machines and clusters consisting of various sub systems. To potentially speed up the processing, a unified way of machine learning is applied on MapReduce frame work. A broadly applicable programming model MapReduce is applied on different learning algorithms belonging to machine learning family for all business decisions. This paper presents parallel implementation of various machine learning algorithms, includes K-Means, Logistic Regression implemented on top of MapReduce model.

References

[1]
Abadi, D., and Rasin, A. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. InVLDB '09 by ACM.
[2]
Alan F. Gates, Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience In Proceedings of International Conference on VLDB '09 by ACM in France 2009.
[3]
Ananthi, S., and Lakshmi J V N, An Analysis On Machine Learning Algorithms Implemented On Hadoop MapReduce. In Proceedings of International Conference on Communication, Computing and Information Technology (Chennai, India 2015). ICCCMIT '14.
[4]
Asha, Sravanthi et.al. 2013 Building Machine learning Algorithms on Hadoop for Big Data. J. in IJET Journal Vol 3 No 2 Feb '13, 143--147.
[5]
Ashish Thusoo, Joy deep Sen Sharma HIVE -- A warehousing solution over a Map -- Reduce framework In Proceedings of International Conference on VLDB Endowment in France, 2009.
[6]
Brugger. D, Parallel Support vector machines In Proceedings of International Conference on NIPS, China, 2007.
[7]
C. C. Chang and C. J. Lin, LIBSVM: A Library for Support Vector Machines, National Taiwan University, Taipei, Taiwan, 2001.
[8]
Cheng-Tao Chu, Sang Kyun Kim, Yi-An L, Andrew Y. Ng, and KunleOlukotun, Map-Reduce for Machine Learning on Multicore, In Proceedings of International Conference on NIPS, pp. 281--288, China, 2007.
[9]
Chih-Wei Hsu and Chih-Jen Lin. A Comparison of Methods for Multi-class Support Vector Machines, J., on IEEE Transactions on Neural Networks 13 (2) Pag es: 415--425, by IEEE (P) Piscataway NJ USA 2002.
[10]
Cortes and V. Vapnik. "Support-vector network", Machine Learning, 20:273--297, Kulwer Academic Publishers, P, AT & TLabs USA 1995.
[11]
Dean, Jeff and Ghemawat, Sanjay. MapReduce: Simplified Data Processing on Large Clusters.In Proceedings of Communications of the ACM New York, USA '04. doi > 10.1145/1327452.1327492
[12]
J. Ekanayake, S. Pallickara and G. Fox (2008), Map-Reduce for Machine Learning on Multicore, In Proceedings IEEE International Conference on E-Science in 2008.
[13]
Gunnar Ratsch, A Brief Introduction into Machine Learning, Friedrich Miescher Laboratory of the Max Planck Society, 2004.
[14]
Herodotos, Harald. Lim, and Gangm. Luo, StarFish - A self tuning system for Big Data Analytics, In Proceedings of fifth Biennial Conference on Innovative Data system Research (CIDR '11) pages 261--272 California USA Jan 2011.
[15]
Hadoop: Open-source implementation of MapReduce.
[16]
Hung-chih Yang, Ali Dasdan, and Ruey-Lung Hsiao, "MapReduce-Merge: simplified relational data processing on large clusters", In Proceedings of ACM SIC MOD Int'l Conf. on Management of Data (SIGMOD'07), pp. 1029--1040.
[17]
Huston. L, M SatyaNarayana -- Storage Architecture for early discard in interactive search, In FAST conference proceedings 2004.
[18]
Jefree. Dean, Mapreduce: A flexible data processing tool, In Proceedings of ACM Communications, Volume 53, pp 72--77, 2010.
[19]
Jingui Li, Xuelian, Lin Improving the shuffle of MapReduce In Proceedings of the IEEE International Conference on Cloud Computing Technology and Science, pages: 266--273 Beijing, China, 2013
[20]
Luiz A. Barroso, Jeffrey Dean. " Web search for a planet: The Google cluster architecture. J., IEEE Micro, 23(2): Los Alamitos, CA, USA, 22--28, April 2003. doi>10.1109/MM.2003.1196112
[21]
C.Ronnie et al. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In J. Proceedings of the VLDB Endowment, pages 1265--1276 doi>10.14778/1454159.1454166
[22]
Pavlo. A, et. al. A Comparison of Approaches to Large-Scale Data Analysis. In Proceedings of. ACM SIGMOD, 2009International conference on Management of Data, pages 165--178, New York, USA 2009.
[23]
Walisa. R, and Wichian. P, An Adaptive ML on MapReduce for improving performance of Large scale data analysis on EC2, In Proceedings of Eleventh International Conference on ICT and Knowledge Engineering. Bangkok, Thailand 2013
[24]
R E Welsch, E Kuh Linear Regression In Working paper 1977 Regression Diagnostics: Identifing Infuential Data and Sources of Collinearity
[25]
Xiaoyi Lu, Bing Wang, Li Zha and ZhiweiXu (2011), Can MPI Benefit Hadoop and Map-Reduce Applications?, In Proceedings of Fortieth IEEE International Conference on Parallel Processing Workshops (ICPPW11), pp. 371--379.

Cited By

View all
  • (2015)Machine learning approaches on map reduce for Big Data analyticsProceedings of the 2015 International Conference on Green Computing and Internet of Things (ICGCIoT)10.1109/ICGCIoT.2015.7380512(480-484)Online publication date: 8-Oct-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WCI '15: Proceedings of the Third International Symposium on Women in Computing and Informatics
August 2015
763 pages
ISBN:9781450333610
DOI:10.1145/2791405
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hadoop
  2. Logistic Regression
  3. MapReduce
  4. Parallel Implementation
  5. Serial Implementation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WCI '15

Acceptance Rates

WCI '15 Paper Acceptance Rate 98 of 452 submissions, 22%;
Overall Acceptance Rate 98 of 452 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Machine learning approaches on map reduce for Big Data analyticsProceedings of the 2015 International Conference on Green Computing and Internet of Things (ICGCIoT)10.1109/ICGCIoT.2015.7380512(480-484)Online publication date: 8-Oct-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media