skip to main content
10.1145/3454127.3456617acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnissConference Proceedingsconference-collections
research-article

Survey on improving the performance of MapReduce in Hadoop

Published: 26 November 2021 Publication History

Abstract

Hadoop has become the most popular and the most used platform in distributed data processing, Hadoop is also an open-source software that implements the MapReduce model for processing big data, it has taken a large part in scientific research in the field of big data, a lot of research has addressed allocation and scheduling in Hadoop system, in this paper we will present the main research done in improving the performance of the MapReduce model of Hadoop platform. The Most previous surveys only focused on Hadoop MapReduce scheduling and how to ameliorate it, but this paper tries to give an overview of the important work that aim to improve the performance of Hadoop MapReduce from different sides (energy, budget, scheduling, makespan …).

References

[1]
[1] M. Senthilkumar and P. Ilango, “A Survey on Job Scheduling in Big Data”, CYBERNETICS AND INFORMATION TECHNOLOGIES, Volume 16, No 3, Sofia 2016.
[2]
[2] Sarika Patil and Shyam Deshmukh, “Survey on Task Assignment Techniques in Hadoop”, International Journal of Computer Applications, December 2012
[3]
[3] B.Thirumala Rao and L.S.S.Reddy, “Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments”, International Journal of Computer Applications (0975 – 8887), Volume 34– No.9, November 2011.
[4]
[4] Dongjin Yoo and Kwang Mong Sim, “A Comparative Review of Job Scheduling for MapReduce”, IEEE CCIS2011.
[5]
[5] Seyed Reza Pakize, “A Comprehensive View of Hadoop MapReduce Scheduling Algorithms”, International Journal of Computer Networks and Communications Security, VOL. 2, NO. 9, SEPTEMBER 2014, 308–317.
[6]
[6] Dazhao Cheng, Jia Rao, Yanfei Guo, and Xiaobo Zhou “Improving MapReduce performance in heterogeneous environments with adaptive task tuning”, 2014 ACM.
[7]
[7] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica, “Job Scheduling for Multi-User MapReduce Clusters”, Technical Report No. UCB/EECS-2009-55, April 30, 2009.
[8]
[8] Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Manzanares, and Xiao Qin, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters”, 2010 IEEE International Symposium on Parallel and Distributed Processing.
[9]
[9] Lena Mashayekhy, Mahyar Movahed Nejad, Daniel Grosu, Quan Zhang, and Weisong Shi, “Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications”, 2015 IEEE Transactions on Parallel and Distributed Systems.
[10]
[10] Maria Malik, Hassan Ghasemzadeh, Tinoosh Mohsenin, Rosario Cammarota, Liang Zhao, Avesta Sasan, Houman Homayoun, and Setareh Rafatirad, “ECoST: Energy-Efficient Co-Locating and Self-Tuning MapReduce Applications”, ICPP 2019. ACM
[11]
[11] Nezih Yigitbasi, Kushal Datta, Nilesh Jain, and Theodore Willke, “Energy Efficient Scheduling of MapReduce Workloads on Heterogeneous Clusters”, GCM’2011. ACM.
[12]
[12] Yanpei Chen, Laura Keys, and Randy H. Katz, “Towards Energy Efficient MapReduce”, Technical Report No. UCB/EECS-2009-109. August 5, 2009.
[13]
[13] Ivanilton Polato, Denilson Barbosa, Abram Hindle, and Fabio Kon, “Hadoop Energy Consumption Reduction with Hybrid HDFS”, SAC 2016. ACM.
[14]
[14] Yang Wang and Wei Shi, “Budget-Driven Scheduling Algorithms for Batches of MapReduce Jobs in Heterogeneous Clouds”, 2014 IEEE Transactions on Cloud Computing.
[15]
[15] Zhuoyao Zhang, Ludmila Cherkasova, and Boon Thau Loo, ”Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud” (2011). 2014 IEEE Network Operations and Management Symposium (NOMS).
[16]
[16] Chen He, Ying Lu, and David Swanson, ”Matchmaking: A New MapReduce Scheduling Technique” (2011). 2011 Third IEEE International Conference on Cloud Computing Technology and Science.
[17]
[17] Jian Tan, Xiaoqiao Meng, and Li Zhang, ”Performance Analysis of Coupling Scheduler for MapReduce/Hadoop”, the 31st Annual IEEE International Conference on Communications, 2012.
[18]
[18] Kamal Kc and Kemafor Anyanwu, “Scheduling Hadoop Jobs to Meet s”, 2010 IEEE Second International Conference on Cloud Computing Technology and Science. 2010.97
[19]
[19] Chen He, Ying Lu, and David Swanson, ”Real-Time Scheduling in MapReduce Clusters”, 2013 IEEE.
[20]
[20] Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu, “Starfish: A Selftuning System for Big Data Analytics”, 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11) January 9-12, 2011.
[21]
[21] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell, “ARIA: automatic resource inference and allocation for mapreduce environments”, ICAC ’11: Proceedings of the 8th ACM international conference on Autonomic computing, June 2011.
[22]
[22] Joel Wolf, Deepak Rajan, Kirsten Hildrum, Rohit Khandekar, Vibhore Kumar, Sujay Parekh, Kun-Lung Wu, and Andrey Balmin, “FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads”, ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, 2010.
[23]
[23] Greg Chiou, Laukik Chitnis, Francis Liu, Yiping Han, Mattias Larsson, Andreas Neumann, Vellanki B. N. Rao, Vijayanand Sankarasubramanian, Siddharth Seth, Chao Tian, and Topher ZiCornell, “Nova: Continuous Pig/Hadoop Workflows”, SIGMOD ’11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, ACM 2011.
[24]
[24] Mohammad Islam, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, and Andreas Neumann, “Oozie: Towards a Scalable Workflow Management System for Hadoop”, SWEET ’12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, ACM 2012.
[25]
[25] Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, and Mohamed Y. Eltabakh, “Redoop Infrastructure for Recurring Big Data Queries”, Proceedings of the VLDB Endowment, August 2014, ACM .
[26]
[26] Harold Lim, Herodotos Herodotou, and Shivnath Babu, “Stubby: A Transformation based Optimizer for MapReduce Workflows”, SWEET ’12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, ACM 2012.
[27]
[27] Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, and Chen Wang, “MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs”, Proceedings of the VLDB Endowment, August 2014, ACM.
[28]
[28] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell, “Resource Provisioning Framework for MapReduce Jobs with Performance Goals”, ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, 2011.
[29]
[29] Mukhtaj Khan, Yong Jin, Maozhen Li, Yang Xiang and Changjun Jiang, “Hadoop Performance Modeling for Job Estimation and Resource Provisioning”, IEEE Transactions on Parallel and Distributed Systems, 2016.
[30]
[30] Apache Hadoop http://hadoop.Apache.org
[31]
[31] S. Li, T. Abdelzaher, M. Yuan, ”Tapa: Temperature aware power allocation in data center with map-reduce”, in: 2011 International Green Computing Conference and Workshops (IGCC), IEEE, 2011, pp. 1–8.
[32]
[32] Z. Niu, B. He, F. Liu, ”Not all joules are equal: Towards energy-efficient and green-aware data processing frameworks”, in: IEEE International Conference on Cloud Engineering, IEEE, 2016, pp. 2–11.
[33]
[33] P.P. Nghiem, S.M. Figueira, ”Towards efficient resource provisioning in mapreduce”, J. Parallel Distributed Comput. 95 (C) (2016) 29–41.
[34]
[34] F. Tian, K. Chen, ”Towards optimal resource provisioning for running mapreduce programs in public clouds”, in: IEEE International Conference on Cloud Computing, vol. 25, IEEE, 2011, pp. 155–162.

Cited By

View all
  • (2023)MapReduce scheduling algorithms in Hadoop: a systematic studyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00520-912:1Online publication date: 10-Oct-2023
  • (2022)Hadoop Map Reduce Techniques: Simplified Data Processing on Large Clusters with Data Mining2022 Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)10.1109/I-SMAC55078.2022.9986501(420-423)Online publication date: 10-Nov-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & Security
April 2021
410 pages
ISBN:9781450388719
DOI:10.1145/3454127
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 November 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Big data
  2. Hadoop
  3. Job Scheduling
  4. MapReduce

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

NISS2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)MapReduce scheduling algorithms in Hadoop: a systematic studyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00520-912:1Online publication date: 10-Oct-2023
  • (2022)Hadoop Map Reduce Techniques: Simplified Data Processing on Large Clusters with Data Mining2022 Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)10.1109/I-SMAC55078.2022.9986501(420-423)Online publication date: 10-Nov-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media