research-article

Survey on improving the performance of MapReduce in Hadoop

Authors:

Nour-Eddine Bakni,

Ismail AssayadAuthors Info & Claims

NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & Security

Article No.: 36, Pages 1 - 5

https://doi.org/10.1145/3454127.3456617

Published: 26 November 2021 Publication History

Abstract

Hadoop has become the most popular and the most used platform in distributed data processing, Hadoop is also an open-source software that implements the MapReduce model for processing big data, it has taken a large part in scientific research in the field of big data, a lot of research has addressed allocation and scheduling in Hadoop system, in this paper we will present the main research done in improving the performance of the MapReduce model of Hadoop platform. The Most previous surveys only focused on Hadoop MapReduce scheduling and how to ameliorate it, but this paper tries to give an overview of the important work that aim to improve the performance of Hadoop MapReduce from different sides (energy, budget, scheduling, makespan …).

References

[1]

[1] M. Senthilkumar and P. Ilango, “A Survey on Job Scheduling in Big Data”, CYBERNETICS AND INFORMATION TECHNOLOGIES, Volume 16, No 3, Sofia 2016.

Digital Library

[2]

[2] Sarika Patil and Shyam Deshmukh, “Survey on Task Assignment Techniques in Hadoop”, International Journal of Computer Applications, December 2012

[3]

[3] B.Thirumala Rao and L.S.S.Reddy, “Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments”, International Journal of Computer Applications (0975 – 8887), Volume 34– No.9, November 2011.

[4]

[4] Dongjin Yoo and Kwang Mong Sim, “A Comparative Review of Job Scheduling for MapReduce”, IEEE CCIS2011.

[5]

[5] Seyed Reza Pakize, “A Comprehensive View of Hadoop MapReduce Scheduling Algorithms”, International Journal of Computer Networks and Communications Security, VOL. 2, NO. 9, SEPTEMBER 2014, 308–317.

[6]

[6] Dazhao Cheng, Jia Rao, Yanfei Guo, and Xiaobo Zhou “Improving MapReduce performance in heterogeneous environments with adaptive task tuning”, 2014 ACM.

Digital Library

[7]

[7] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica, “Job Scheduling for Multi-User MapReduce Clusters”, Technical Report No. UCB/EECS-2009-55, April 30, 2009.

[8]

[8] Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Manzanares, and Xiao Qin, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters”, 2010 IEEE International Symposium on Parallel and Distributed Processing.

[9]

[9] Lena Mashayekhy, Mahyar Movahed Nejad, Daniel Grosu, Quan Zhang, and Weisong Shi, “Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications”, 2015 IEEE Transactions on Parallel and Distributed Systems.

[10]

[10] Maria Malik, Hassan Ghasemzadeh, Tinoosh Mohsenin, Rosario Cammarota, Liang Zhao, Avesta Sasan, Houman Homayoun, and Setareh Rafatirad, “ECoST: Energy-Efficient Co-Locating and Self-Tuning MapReduce Applications”, ICPP 2019. ACM

Digital Library

[11]

[11] Nezih Yigitbasi, Kushal Datta, Nilesh Jain, and Theodore Willke, “Energy Efficient Scheduling of MapReduce Workloads on Heterogeneous Clusters”, GCM’2011. ACM.

[12]

[12] Yanpei Chen, Laura Keys, and Randy H. Katz, “Towards Energy Efficient MapReduce”, Technical Report No. UCB/EECS-2009-109. August 5, 2009.

[13]

[13] Ivanilton Polato, Denilson Barbosa, Abram Hindle, and Fabio Kon, “Hadoop Energy Consumption Reduction with Hybrid HDFS”, SAC 2016. ACM.

Digital Library

[14]

[14] Yang Wang and Wei Shi, “Budget-Driven Scheduling Algorithms for Batches of MapReduce Jobs in Heterogeneous Clouds”, 2014 IEEE Transactions on Cloud Computing.

[15]

[15] Zhuoyao Zhang, Ludmila Cherkasova, and Boon Thau Loo, ”Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud” (2011). 2014 IEEE Network Operations and Management Symposium (NOMS).

[16]

[16] Chen He, Ying Lu, and David Swanson, ”Matchmaking: A New MapReduce Scheduling Technique” (2011). 2011 Third IEEE International Conference on Cloud Computing Technology and Science.

[17]

[17] Jian Tan, Xiaoqiao Meng, and Li Zhang, ”Performance Analysis of Coupling Scheduler for MapReduce/Hadoop”, the 31st Annual IEEE International Conference on Communications, 2012.

[18]

[18] Kamal Kc and Kemafor Anyanwu, “Scheduling Hadoop Jobs to Meet s”, 2010 IEEE Second International Conference on Cloud Computing Technology and Science. 2010.97

[19]

[19] Chen He, Ying Lu, and David Swanson, ”Real-Time Scheduling in MapReduce Clusters”, 2013 IEEE.

[20]

[20] Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu, “Starfish: A Selftuning System for Big Data Analytics”, 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11) January 9-12, 2011.

[21]

[21] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell, “ARIA: automatic resource inference and allocation for mapreduce environments”, ICAC ’11: Proceedings of the 8th ACM international conference on Autonomic computing, June 2011.

Digital Library

[22]

[22] Joel Wolf, Deepak Rajan, Kirsten Hildrum, Rohit Khandekar, Vibhore Kumar, Sujay Parekh, Kun-Lung Wu, and Andrey Balmin, “FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads”, ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, 2010.

[23]

[23] Greg Chiou, Laukik Chitnis, Francis Liu, Yiping Han, Mattias Larsson, Andreas Neumann, Vellanki B. N. Rao, Vijayanand Sankarasubramanian, Siddharth Seth, Chao Tian, and Topher ZiCornell, “Nova: Continuous Pig/Hadoop Workflows”, SIGMOD ’11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, ACM 2011.

[24]

[24] Mohammad Islam, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, and Andreas Neumann, “Oozie: Towards a Scalable Workflow Management System for Hadoop”, SWEET ’12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, ACM 2012.

Digital Library

[25]

[25] Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, and Mohamed Y. Eltabakh, “Redoop Infrastructure for Recurring Big Data Queries”, Proceedings of the VLDB Endowment, August 2014, ACM .

Digital Library

[26]

[26] Harold Lim, Herodotos Herodotou, and Shivnath Babu, “Stubby: A Transformation based Optimizer for MapReduce Workflows”, SWEET ’12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, ACM 2012.

Digital Library

[27]

[27] Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, and Chen Wang, “MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs”, Proceedings of the VLDB Endowment, August 2014, ACM.

Digital Library

[28]

[28] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell, “Resource Provisioning Framework for MapReduce Jobs with Performance Goals”, ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, 2011.

Digital Library

[29]

[29] Mukhtaj Khan, Yong Jin, Maozhen Li, Yang Xiang and Changjun Jiang, “Hadoop Performance Modeling for Job Estimation and Resource Provisioning”, IEEE Transactions on Parallel and Distributed Systems, 2016.

Digital Library

[30]

[30] Apache Hadoop http://hadoop.Apache.org

[31]

[31] S. Li, T. Abdelzaher, M. Yuan, ”Tapa: Temperature aware power allocation in data center with map-reduce”, in: 2011 International Green Computing Conference and Workshops (IGCC), IEEE, 2011, pp. 1–8.

Digital Library

[32]

[32] Z. Niu, B. He, F. Liu, ”Not all joules are equal: Towards energy-efficient and green-aware data processing frameworks”, in: IEEE International Conference on Cloud Engineering, IEEE, 2016, pp. 2–11.

[33]

[33] P.P. Nghiem, S.M. Figueira, ”Towards efficient resource provisioning in mapreduce”, J. Parallel Distributed Comput. 95 (C) (2016) 29–41.

Digital Library

[34]

[34] F. Tian, K. Chen, ”Towards optimal resource provisioning for running mapreduce programs in public clouds”, in: IEEE International Conference on Cloud Computing, vol. 25, IEEE, 2011, pp. 155–162.

Cited By

Hedayati SMaleki NOlsson TAhlgren FSeyednezhad MBerahmand K(2023)MapReduce scheduling algorithms in Hadoop: a systematic studyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00520-912:1Online publication date: 10-Oct-2023
https://dl.acm.org/doi/10.1186/s13677-023-00520-9
Suresh SRajesh Kumar TNagalakshmi MBennilo Fernandes JKavitha S(2022)Hadoop Map Reduce Techniques: Simplified Data Processing on Large Clusters with Data Mining2022 Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)10.1109/I-SMAC55078.2022.9986501(420-423)Online publication date: 10-Nov-2022
https://doi.org/10.1109/I-SMAC55078.2022.9986501

Recommendations

MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Runtime Estimation Using Linear Regression Method in Hadoop MapReduce
NISS '24: Proceedings of the 7th International Conference on Networking, Intelligent Systems and Security

Hadoop is one of the open-source frameworks and most used by big data platforms to store and process massive amounts of data across a cluster of nodes in parallel. For Big Data clients who are generally seeking to minimize their costs, they are ...
High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop
ICICA '14: Proceedings of the 2014 International Conference on Intelligent Computing Applications

Hadoop is a quickly budding ecosystem of components based on Google's MapReduce algorithm and file system work for implementing MapReduce algorithms in a scalable fashion and distributed on commodity hardware. Hadoop enables users to store and process ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & Security

April 2021

410 pages

ISBN:9781450388719

DOI:10.1145/3454127

Editors:
Pr. Ben Ahmed Mohamed
FSTT/UAE, Tangier Morocco
,
Pr. Boudhir Anouar Abdelhakim
FSTT/UAE, Tangier Morocco
,
Pr. Tomader Mazri
ENSAK ITU, Kenitra Morocco

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

NISS2021

NISS2021: The 4th International Conference on Networking, Information Systems & Security.

April 1 - 2, 2021

AA, KENITRA, Morocco

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
71
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hedayati SMaleki NOlsson TAhlgren FSeyednezhad MBerahmand K(2023)MapReduce scheduling algorithms in Hadoop: a systematic studyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00520-912:1Online publication date: 10-Oct-2023
https://dl.acm.org/doi/10.1186/s13677-023-00520-9
Suresh SRajesh Kumar TNagalakshmi MBennilo Fernandes JKavitha S(2022)Hadoop Map Reduce Techniques: Simplified Data Processing on Large Clusters with Data Mining2022 Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)10.1109/I-SMAC55078.2022.9986501(420-423)Online publication date: 10-Nov-2022
https://doi.org/10.1109/I-SMAC55078.2022.9986501

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents