research-article

AEGEUS: An online partition skew mitigation algorithm for mapreduce

Authors:
Vimalkumar Kumaresan

College of Engineering Guindy, Anna University, India

College of Engineering Guindy, Anna University, India
View Profile

,
R. Baskaran

College of Engineering Guindy, Anna University, India

College of Engineering Guindy, Anna University, India
View Profile

ICIA-16: Proceedings of the International Conference on Informatics and AnalyticsAugust 2016Article No.: 100Pages 1–8https://doi.org/10.1145/2980258.2980461

Published:25 August 2016Publication History

ICIA-16: Proceedings of the International Conference on Informatics and Analytics

Pages 1–8

Editorial Notes

NOTICE OF CONCERN: ACM has received evidence that casts doubt on the integrity of the peer review process for the ICIA 2016 Conference. As a result, ACM is issuing a Notice of Concern for all papers published and strongly suggests that the papers from this Conference not be cited in the literature until ACM's investigation has concluded and final decisions have been made regarding the integrity of the peer review process for this Conference.

ABSTRACT

This paper investigates the partition skew problem at reduce phase in the MapReduce jobs. Our studies with the Hadoop addresses this problem in both offline and online manner. Offline is a heuristics based approach which has to wait for the completion of map tasks and involves computation overhead to estimate the partition size. In another approach, they distribute the overloaded tasks across other nodes that needed extra split and merge operation. These extra operations, in turn, hamper the performance of the system. In this paper, we propose Aegeus, an on-line streaming based skew mitigation approach for MapReduce jobs which do not have long waiting time and extra operations for addressing the skew problem. Aegeus predicts the partition size of the each map tasks and creates the resource specification based on its requirement even before the completion of map phase. Hence, the proposed system can create the container based on the workload which can improve the overall job completion time and system performance. We evaluated Aegeus by using benchmark datasets and, compare its performance with naive Hadoop. Based on our observation, Aegeus outperforms naive Hadoop by 42% by maximizing the overall performance of the application and system.

References

F. Ahmad, S. Lee, M. Thottethodi, and T. Vijaykumar. Puma: Purdue mapreduce benchmarks suite. 2012.Google Scholar
G. Ananthanarayanan, S. Kandula, A. G. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in map-reduce clusters using mantri. In OSDI, volume 10, page 24, 2010. Google ScholarDigital Library
Q. Chen, J. Yao, and Z. Xiao. Libra: Lightweight data skew mitigation in mapreduce. IEEE Transactions on Parallel and Distributed Systems, 26(9):2520--2533, 2015.Google ScholarCross Ref
M. Company. http://www.mckinsey.com/business-functions/business-technology/our-insights/the-need-to-lead-in-data-and-analytics. visited 10-may-2016.Google Scholar
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
P. Dhawalia, S. Kailasam, and D. Janakiram. Chisel: A resource savvy approach for handling skew in mapreduce applications. In 2013 IEEE Sixth International Conference on Cloud Computing, pages 652--660. IEEE, 2013. Google ScholarDigital Library
P. Dhawalia, S. Kailasam, and D. Janakiram. Chisel++: handling partitioning skew in mapreduce framework using efficient range partitioning technique. In Proceedings of the sixth international workshop on Data intensive distributed computing, pages 21--28. ACM, 2014. Google ScholarDigital Library
K. Elmeleegy, C. Olston, and B. Reed. Spongefiles: Mitigating data skew in mapreduce using distributed memory. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 551--562. ACM, 2014. Google ScholarDigital Library
A. Hadoop. https://hadoop.apache.org/.Google Scholar
M. Hammoud and M. F. Sakr. Locality-aware reduce task scheduling for mapreduce. In Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on, pages 570--576. IEEE, 2011. Google ScholarDigital Library
D. Hsu and S. Sabato. Heavy-tailed regression with a generalized median-of-means. In ICML, pages 37--45, 2014.Google Scholar
S. Ibrahim, H. Jin, L. Lu, S. Wu, B. He, and L. Qi. Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud. In Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, pages 17--24. IEEE, 2010. Google ScholarDigital Library
Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. Skewtune: mitigating skew in mapreduce applications. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 25--36. ACM, 2012. Google ScholarDigital Library
Y. Le, J. Liu, F. Ergün, and D. Wang. Online load balancing for mapreduce with skewed data input. In IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pages 2004--2012. IEEE, 2014.Google ScholarCross Ref
Z. Liu, Q. Zhang, R. Boutaba, Y. Liu, and B. Wang. Optima: on-line partitioning skew mitigation for mapreduce with resource adjustment. Journal of Network and Systems Management, pages 1--25, 2016. Google ScholarDigital Library
Z. Liu, Q. Zhang, M. F. Zhani, R. Boutaba, Y. Liu, and Z. Gong. Dreams: Dynamic resource allocation for mapreduce with data skew. In 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pages 18--26. IEEE, 2015.Google ScholarCross Ref
S. Sabato and R. Munos. Active regression by stratification. In Advances in Neural Information Processing Systems, pages 469--477, 2014. Google ScholarDigital Library
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, et al. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, page 5. ACM, 2013. Google ScholarDigital Library
vCloud. http://www.vcloudnews.com/every-day-big-data-statistics-2-5-quintillion-bytes-of-data-created-daily. visited 10-may-2016.Google Scholar
N. Zaheilas and V. Kalogeraki. Real-time scheduling of skewed mapreduce jobs in heterogeneous environments. In 11th International Conference on Autonomic Computing (ICAC 14), pages 189--200, 2014.Google Scholar

Recommendations

Chisel++: handling partitioning skew in MapReduce framework using efficient range partitioning technique
DIDC '14: Proceedings of the sixth international workshop on Data intensive distributed computing

Job completion in MapReduce framework depends upon the slowest running reduce task. Inordinate time gap among the completion points of reduce tasks delays a job significantly. Synchronization in reduce task completion not only completes a job faster but ...
Read More
OPTIMA: On-Line Partitioning Skew Mitigation for MapReduce with Resource Adjustment

Partitioning skew has been shown to be a major issue that can significantly prolong the execution time of MapReduce jobs. Most of the existing off-line heuristics for partitioning skew mitigation are inefficient; they have to wait for the completion of ...
Read More
Improvement of job completion time in data-intensive cloud computing applications
Abstract
Task stragglers in MapReduce jobs dramatically impede job execution of data-intensive computing in cloud data centers. This impedance is due to the uneven distribution of input data, heterogeneous data nodes, resource contention situations, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICIA-16: Proceedings of the International Conference on Informatics and Analytics
August 2016
868 pages
ISBN:9781450347563
DOI:10.1145/2980258
Conference Chairs:
V. Akila,
N. Sivakumar,
K. Saruladha,
G. Zayaraz,
E. Ilavarasan
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MapReduce
big data processing
cloud computing
online load balancing algorithms
partitioning skew
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 63
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

AEGEUS: An online partition skew mitigation algorithm for mapreduce

ICIA-16: Proceedings of the International Conference on Informatics and Analytics

Editorial Notes

ABSTRACT

References

Cited By

Recommendations

Chisel++: handling partitioning skew in MapReduce framework using efficient range partitioning technique

OPTIMA: On-Line Partitioning Skew Mitigation for MapReduce with Resource Adjustment

Improvement of job completion time in data-intensive cloud computing applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

AEGEUS: An online partition skew mitigation algorithm for mapreduce

ICIA-16: Proceedings of the International Conference on Informatics and Analytics

Editorial Notes

ABSTRACT

References

Cited By

Recommendations

Chisel++: handling partitioning skew in MapReduce framework using efficient range partitioning technique

OPTIMA: On-Line Partitioning Skew Mitigation for MapReduce with Resource Adjustment

Improvement of job completion time in data-intensive cloud computing applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media