research-article

POSTER: An Intelligent Framework to Parallelize Hadoop Phases

Authors:
Neda Maleki

Department of Computer Engineering, Science and Research, Branch, Islamic Azad University, Tehran, Iran

Department of Computer Engineering, Science and Research, Branch, Islamic Azad University, Tehran, Iran
View Profile

,
Amir Masoud Rahmani

Department of Computer Engineering, Science and Research, Branch, Islamic Azad University, Tehran, Iran

Department of Computer Engineering, Science and Research, Branch, Islamic Azad University, Tehran, Iran
View Profile

,
Mauro Conti

Department of Mathematics, University of Padua, Padua, Italy

Department of Mathematics, University of Padua, Padua, Italy
View Profile

HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed ComputingJune 2018Pages 1–2https://doi.org/10.1145/3220192.3220193

Published:11 June 2018Publication History

HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing

Pages 1–2

ABSTRACT

Hadoop-stock is a reliable, scalable, and open source implementation of the MapReduce framework to process data-intensive applications in a distributed and parallel environment. In a common environment between multiple users with various types of applications, due to the lower number of resources than the number of jobs, there will be multi-wave jobs. Shuffling as the longest phase of running a job has the most adverse effect (network traffic) on the job execution time. On one hand, due to the dependency of shuffle phase to reduce task, the shuffle phase could not start until the reduce task being scheduled. On the other hand, the static scheduling of reduce tasks results in loss of reduce slots. This paper presents our ongoing effort in the designing an intelligent service in which the sort/merge and shuffle phases are completely independent of map and reduce phases and could act in parallel with map and reduce phases. This parallelism mitigates the job completion time.

References

T. White, Hadoop: The Definitive Guide, O'Reilly Media, Inc., 2009. Google ScholarDigital Library
C. P. Chen, C. Y. Zhang, Data-intensive applications, challenges, techniques and technologies: A survey on big data, Information Sciences 275 (2014) 314--347.Google ScholarCross Ref
Y. Guo, J. Rao, D. Cheng, X. Zhou, Ishuffle: Improving hadoop performance with shuffle-on-write, IEEE Transactions on Parallel and Distributed Systems 28 (6) (2017) 1649--1662. Google ScholarDigital Library
S. Rao, R. Ramakrishnan, A. Silberstein, M. Ovsiannikov, D. Reeves, Sailfish: a framework for large scale data processing, Proceedings of the Third ACM Symposium on Cloud Computing, (2012) 1--14, San Jose, California. Google ScholarDigital Library
https://sourceforge.net/p/kosmosfs/wiki/Home/.Google Scholar

Index Terms

POSTER: An Intelligent Framework to Parallelize Hadoop Phases
1. Computer systems organization
  1. Architectures
    1. Distributed architectures

Recommendations

HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs

Large-scale MapReduce clusters that routinely process big data bring challenges to the cloud computing. One of the key challenges is to reduce the response time of these MapReduce clusters by minimizing their makespans. It is observed that the order in ...
Read More
Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance
MASCOTS '12: Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems

Large-scale MapReduce clusters that routinely process petabytes of unstructured and semi-structured data represent a new entity in the changing landscape of clouds. A key challenge is to increase the utilization of these MapReduce clusters. In this work,...
Read More
An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

The MapReduce framework is considered to be an effective resolution for huge and parallel data processing. This paper treats a massive data processing workflow as a DAG graph consisting of MapReduce jobs. In a heterogeneous computing environment, the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing
June 2018
25 pages
ISBN:9781450358996
DOI:10.1145/3220192

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data Placement
Hadoop
Independent Service
MapReduce
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate166of966submissions,17%
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 80
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

POSTER: An Intelligent Framework to Parallelize Hadoop Phases

HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs

Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance

An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

POSTER: An Intelligent Framework to Parallelize Hadoop Phases

HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs

Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance

An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media