research-article

A Theoretical Model for Big Data Analytics using Machine Learning Algorithms

Authors:

Ananthi Sheshasaayee,

J. V. N. LakshmiAuthors Info & Claims

WCI '15: Proceedings of the Third International Symposium on Women in Computing and Informatics

Pages 635 - 639

https://doi.org/10.1145/2791405.2791549

Published: 10 August 2015 Publication History

Abstract

Big Data processing is currently becoming increasingly important in modern era due to continuous growth of the amount of data generated in various fields. Architecture for Big Data usually ranges across multiple machines and clusters consisting of various sub systems. To potentially speed up the processing, a unified way of machine learning is applied on MapReduce frame work. A broadly applicable programming model MapReduce is applied on different learning algorithms belonging to machine learning family for all business decisions. This paper presents parallel implementation of various machine learning algorithms, includes K-Means, Logistic Regression implemented on top of MapReduce model.

References

[1]

Abadi, D., and Rasin, A. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. InVLDB '09 by ACM.

Digital Library

[2]

Alan F. Gates, Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience In Proceedings of International Conference on VLDB '09 by ACM in France 2009.

Digital Library

[3]

Ananthi, S., and Lakshmi J V N, An Analysis On Machine Learning Algorithms Implemented On Hadoop MapReduce. In Proceedings of International Conference on Communication, Computing and Information Technology (Chennai, India 2015). ICCCMIT '14.

[4]

Asha, Sravanthi et.al. 2013 Building Machine learning Algorithms on Hadoop for Big Data. J. in IJET Journal Vol 3 No 2 Feb '13, 143--147.

[5]

Ashish Thusoo, Joy deep Sen Sharma HIVE -- A warehousing solution over a Map -- Reduce framework In Proceedings of International Conference on VLDB Endowment in France, 2009.

Digital Library

[6]

Brugger. D, Parallel Support vector machines In Proceedings of International Conference on NIPS, China, 2007.

[7]

C. C. Chang and C. J. Lin, LIBSVM: A Library for Support Vector Machines, National Taiwan University, Taipei, Taiwan, 2001.

[8]

Cheng-Tao Chu, Sang Kyun Kim, Yi-An L, Andrew Y. Ng, and KunleOlukotun, Map-Reduce for Machine Learning on Multicore, In Proceedings of International Conference on NIPS, pp. 281--288, China, 2007.

[9]

Chih-Wei Hsu and Chih-Jen Lin. A Comparison of Methods for Multi-class Support Vector Machines, J., on IEEE Transactions on Neural Networks 13 (2) Pag es: 415--425, by IEEE (P) Piscataway NJ USA 2002.

Digital Library

[10]

Cortes and V. Vapnik. "Support-vector network", Machine Learning, 20:273--297, Kulwer Academic Publishers, P, AT & TLabs USA 1995.

Digital Library

[11]

Dean, Jeff and Ghemawat, Sanjay. MapReduce: Simplified Data Processing on Large Clusters.In Proceedings of Communications of the ACM New York, USA '04. doi > 10.1145/1327452.1327492

Digital Library

[12]

J. Ekanayake, S. Pallickara and G. Fox (2008), Map-Reduce for Machine Learning on Multicore, In Proceedings IEEE International Conference on E-Science in 2008.

[13]

Gunnar Ratsch, A Brief Introduction into Machine Learning, Friedrich Miescher Laboratory of the Max Planck Society, 2004.

[14]

Herodotos, Harald. Lim, and Gangm. Luo, StarFish - A self tuning system for Big Data Analytics, In Proceedings of fifth Biennial Conference on Innovative Data system Research (CIDR '11) pages 261--272 California USA Jan 2011.

[15]

Hadoop: Open-source implementation of MapReduce.

[16]

Hung-chih Yang, Ali Dasdan, and Ruey-Lung Hsiao, "MapReduce-Merge: simplified relational data processing on large clusters", In Proceedings of ACM SIC MOD Int'l Conf. on Management of Data (SIGMOD'07), pp. 1029--1040.

Digital Library

[17]

Huston. L, M SatyaNarayana -- Storage Architecture for early discard in interactive search, In FAST conference proceedings 2004.

Digital Library

[18]

Jefree. Dean, Mapreduce: A flexible data processing tool, In Proceedings of ACM Communications, Volume 53, pp 72--77, 2010.

Digital Library

[19]

Jingui Li, Xuelian, Lin Improving the shuffle of MapReduce In Proceedings of the IEEE International Conference on Cloud Computing Technology and Science, pages: 266--273 Beijing, China, 2013

Digital Library

[20]

Luiz A. Barroso, Jeffrey Dean. " Web search for a planet: The Google cluster architecture. J., IEEE Micro, 23(2): Los Alamitos, CA, USA, 22--28, April 2003. doi>10.1109/MM.2003.1196112

Digital Library

[21]

C.Ronnie et al. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In J. Proceedings of the VLDB Endowment, pages 1265--1276 doi>10.14778/1454159.1454166

[22]

Pavlo. A, et. al. A Comparison of Approaches to Large-Scale Data Analysis. In Proceedings of. ACM SIGMOD, 2009International conference on Management of Data, pages 165--178, New York, USA 2009.

Digital Library

[23]

Walisa. R, and Wichian. P, An Adaptive ML on MapReduce for improving performance of Large scale data analysis on EC2, In Proceedings of Eleventh International Conference on ICT and Knowledge Engineering. Bangkok, Thailand 2013

[24]

R E Welsch, E Kuh Linear Regression In Working paper 1977 Regression Diagnostics: Identifing Infuential Data and Sources of Collinearity

[25]

Xiaoyi Lu, Bing Wang, Li Zha and ZhiweiXu (2011), Can MPI Benefit Hadoop and Map-Reduce Applications?, In Proceedings of Fortieth IEEE International Conference on Parallel Processing Workshops (ICPPW11), pp. 371--379.

Digital Library

Cited By

Lakshmi JSheshasaayee A(2015)Machine learning approaches on map reduce for Big Data analyticsProceedings of the 2015 International Conference on Green Computing and Internet of Things (ICGCIoT)10.1109/ICGCIoT.2015.7380512(480-484)Online publication date: 8-Oct-2015
https://dl.acm.org/doi/10.1109/ICGCIoT.2015.7380512

Index Terms

A Theoretical Model for Big Data Analytics using Machine Learning Algorithms
1. Computing methodologies
  1. Machine learning
  2. Modeling and simulation
    1. Model development and analysis
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Big Data Management: Advanced Issues and Approaches

The objective of this article is to provide the advanced issues and approaches of big data management. The literature review indicates the overview of big data management; the aspects of Big Data Analytics BDA; the importance of big data management; the ...
Big Data Analytics
Scalable machine-learning algorithms for big data analytics: a comprehensive review

Big data analytics is one of the emerging technologies as it promises to provide better insights from huge and heterogeneous data. Big data analytics involves selecting the suitable big data storage and computational framework augmented by scalable ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WCI '15: Proceedings of the Third International Symposium on Women in Computing and Informatics

August 2015

763 pages

ISBN:9781450333610

DOI:10.1145/2791405

Editor:
Indu Nair
SCMS, Kochi, India
,
General Chairs:
Sushmita Mitra
Indian Statistical Institute, Kolkata, India
,
Ljiljana Trajković
Simon Fraser University, Canada
,
Program Chairs:
Punam Bedi
University of Delhi, India
,
Suzanne McIntosh
New York University and Cloudera Inc., USA
,
M. S. Rajasree
IIITM-K, Trivandrum, India

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WCI '15

WCI '15: Third International Symposium on Women in Computing and Informatics

August 10 - 13, 2015

Kochi, India

Acceptance Rates

WCI '15 Paper Acceptance Rate 98 of 452 submissions, 22%;

Overall Acceptance Rate 98 of 452 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
229
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lakshmi JSheshasaayee A(2015)Machine learning approaches on map reduce for Big Data analyticsProceedings of the 2015 International Conference on Green Computing and Internet of Things (ICGCIoT)10.1109/ICGCIoT.2015.7380512(480-484)Online publication date: 8-Oct-2015
https://dl.acm.org/doi/10.1109/ICGCIoT.2015.7380512

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents