research-article

MR-runner: a modularized map-reduce job management tool

Authors:

Xinsheng Yang,

Wei Wang,

Lijie Xu,

Jie liu,

Jun WeiAuthors Info & Claims

Internetware '13: Proceedings of the 5th Asia-Pacific Symposium on Internetware

Article No.: 19, Pages 1 - 4

https://doi.org/10.1145/2532443.2532474

Published: 23 October 2013 Publication History

Get Access

Abstract

Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete "map" and "reduce" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called "de-parallel". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a "client", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.

References

[1]

Mahout. http://mahout.apache.org/.

Google Scholar

[2]

Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. Haloop: Efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment, 3(1--2): 285--296, 2010.

Digital Library

Google Scholar

[3]

R. Chaiken, B. Jenkins, P.-Å. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment, 1(2): 1265--1276, 2008.

Digital Library

Google Scholar

[4]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1): 107--113, 2008.

Digital Library

Google Scholar

[5]

J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 810--818. ACM, 2010.

Digital Library

Google Scholar

[6]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 41(3): 59--72, 2007.

Digital Library

Google Scholar

[7]

L. Lin, V. Lychagina, W. Liu, Y. Kwon, S. Mittal, and M. Wong. Tenzing a sql implementation on the mapreduce framework. 2011.

Google Scholar

[8]

C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1099--1110. ACM, 2008.

Digital Library

Google Scholar

[9]

A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2): 1626--1629, 2009.

Digital Library

Google Scholar

[10]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, pages 10--10, 2010.

Digital Library

Google Scholar

Index Terms

MR-runner: a modularized map-reduce job management tool

Recommendations

Scale-out beyond map-reduce
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

The amount and variety of data being collected in the enterprise is growing at a staggering pace. The default now is to capture and store any and all data, in anticipation of potential future strategic value, and vast amounts of data are being generated ...
Rainfall Prediction using Artificial Neural Network on Map-Reduce Framework
WCI '15: Proceedings of the Third International Symposium on Women in Computing and Informatics

Big data is a celebrated topic in Business as well as research community for several years. With the revolution of Big Data, it is becoming easy and less expensive to store tremendous amount of data for future analysis. Weather data gets accumulated very ...
SARAH - Statistical Analysis for Resource Allocation in Hadoop
TRUSTCOM '14: Proceedings of the 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications

Improving the performance of big data applications requires understanding the size and distribution of the input and intermediate data sets. Obtaining this understanding and then translating it into resource settings is challenging. SARAH provides a set ...

Comments

Information & Contributors

Information

Published In

Internetware '13: Proceedings of the 5th Asia-Pacific Symposium on Internetware

October 2013

211 pages

ISBN:9781450323697

DOI:10.1145/2532443

Conference Chairs:
Hong Mei,
Jian Lv,
Program Chair:
Xiaoguang Mao

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

Internetware '13

Sponsor:

Internetware '13: The Fifth Asia-Pacific Symposium on Internetware

October 23 - 24, 2013

Changsha, China

Acceptance Rates

Internetware '13 Paper Acceptance Rate 15 of 50 submissions, 30%;

Overall Acceptance Rate 55 of 111 submissions, 50%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
114
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Index Terms

Recommendations

Scale-out beyond map-reduce

Rainfall Prediction using Artificial Neural Network on Map-Reduce Framework

SARAH - Statistical Analysis for Resource Allocation in Hadoop

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations