skip to main content
10.1145/2532443.2532474acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

MR-runner: a modularized map-reduce job management tool

Published: 23 October 2013 Publication History

Abstract

Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete "map" and "reduce" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called "de-parallel". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a "client", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.

References

[1]
Mahout. http://mahout.apache.org/.
[2]
Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. Haloop: Efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment, 3(1--2): 285--296, 2010.
[3]
R. Chaiken, B. Jenkins, P.-Å. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment, 1(2): 1265--1276, 2008.
[4]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1): 107--113, 2008.
[5]
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 810--818. ACM, 2010.
[6]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 41(3): 59--72, 2007.
[7]
L. Lin, V. Lychagina, W. Liu, Y. Kwon, S. Mittal, and M. Wong. Tenzing a sql implementation on the mapreduce framework. 2011.
[8]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1099--1110. ACM, 2008.
[9]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2): 1626--1629, 2009.
[10]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, pages 10--10, 2010.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
Internetware '13: Proceedings of the 5th Asia-Pacific Symposium on Internetware
October 2013
211 pages
ISBN:9781450323697
DOI:10.1145/2532443
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • NJU: Nanjing University
  • CCF: China Computer Federation
  • Chinese Academy of Sciences

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. iteration
  2. job management
  3. map-reduce
  4. modularization

Qualifiers

  • Research-article

Funding Sources

Conference

Internetware '13
Sponsor:
  • NJU
  • CCF

Acceptance Rates

Internetware '13 Paper Acceptance Rate 15 of 50 submissions, 30%;
Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 114
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media