Delay Scheduling with Reduced Workload on JobTracker in Hadoop

Sethi, Krishan Kumar; Ramesh, Dharavath

doi:10.1007/978-3-319-28031-8_32

Krishan Kumar Sethi¹⁹ &
Dharavath Ramesh¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 424))

993 Accesses
2 Citations

Abstract

Job scheduling is one of the critical issues in MapReduce processing that affects the performance of Hadoop framework. Delay scheduling introduces a small delay during job scheduling to optimize the data locality. Delay scheduler may scan a job more than once before reaching a certain deadline after which the job is scheduled. This causes extra overhead on the scheduler. Moreover a higher priority job may get delayed. We propose an algorithm in which the load is distributed among the individual nodes. Our algorithm insists the scheduler to launch a high priority job on a free node. The node then executes the job locally or schedules it to some other node based on the availability of data. Experimental results show that the proposed algorithm performs better than Hadoop and records less execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Turner, V., et al.: The digital universe of opportunities: rich data and the increasing value of the internet of things. In: International Data Corporation, White Paper, IDC_1672 (2014)
Google Scholar
Philip Chen, C.L., Zhang, Chun-Yang: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Article Google Scholar
Hashem, Ibrahim Abaker Targio, Yaqoob, Ibrar, Badrul Anuar, Nor, Mokhtar, Salimah, Gani, Abdullah, Ullah Khan, Samee: The rise of big data on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015)
Article Google Scholar
Kambatla, Karthik, Kollias, Giorgos, Kumar, Vipin, Grama, Ananth: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)
Article Google Scholar
Hashem, Targio, Ibrahim Abaker, et al.: The rise of “big data” on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015)
Article Google Scholar
Dean, Jeffrey, Ghemawat, Sanjay: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed data-parallel programs from sequential building blocks. In: Conference Computer System (EuroSys), pp. 59–72 (2007)
Google Scholar
Yang, H.C., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-Reduce-Merge: simplified relational data processing on large clusters. In: Proceeding of ACM SIGMOD International Conference Management of Data (2007)
Google Scholar
Polato, Ivanilton, et al.: A comprehensive view of Hadoop research—A systematic literature review. J. Netw. Comput. Appl. 46, 1–25 (2014)
Article Google Scholar
Apache Hadoop.: http://hadoop.apache.orgJune 2011
Zaharia, M., et al.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems. ACM (2010)
Google Scholar
Hadoop’s Fair Scheduler.: https://hadoop.apache.org/docs/r1.2.1/fair_scheduler
Zaharia, M., et al.: Improving MapReduce performance in heterogeneous environments. In: OSDI, vol. 8(4) (2008)
Google Scholar
Chen, Q., et al.: Samr: A self-adaptive Mapreduce scheduling algorithm in heterogeneous environment. In: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT). IEEE (2010)
Google Scholar
Guo, Z., Fox, G., Zhou, M.: Investigation of data locality in Mapreduce. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012). IEEE Computer Society (2012)
Google Scholar
Ibrahim, S., et al.: LEEN: Locality/fairness-aware key partitioning for Mapreduce in the cloud. In: IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), (2010)
Google Scholar
Nguyen, P., et al.: A hybrid scheduling algorithm for data intensive workloads in a Mapreduce environment. In: Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing. IEEE Computer Society (2012)
Google Scholar
He, C., Lu, Y., Swanson, D.: Matchmaking: a new Mapreduce scheduling technique. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom). IEEE (2011)
Google Scholar
Abad, C.L., Lu, Y., Campbell, R.H.: DARE: Adaptive data replication for efficient cluster scheduling. In: 2011 IEEE International Conference on Cluster Computing (CLUSTER). IEEE (2011)
Google Scholar
Ibrahim, S., et al.: Maestro: Replica-aware map scheduling for Mapreduce. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE (2012)
Google Scholar
Ahmad, Faraz, et al.: MapReduce with communication overlap (MaRCO). J. Parallel Distrib. Comput. 73(5), 608–620 (2013)
Article Google Scholar
Tang, Zhuo, et al.: A self-adaptive scheduling algorithm for reduce start time. Future Gener. Comput. Syst. 43, 51–60 (2015)
Article Google Scholar
Hammoud, M., Rehman, M.S., Sakr, M.F.: Center-of-gravity reduce task scheduling to lower Mapreduce network traffic. In: Cloud Computing (CLOUD). IEEE (2012)
Google Scholar
Hammoud, M, Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom). IEEE (2011)
Google Scholar

Download references

Acknowledgements

The research work is supported by Department of Computer Science & Engineering, Indian School of Mines, Dhanbad, India.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian School of Mines, Dhanbad, 826004, Jharkhand, India
Krishan Kumar Sethi & Dharavath Ramesh

Authors

Krishan Kumar Sethi
View author publications
You can also search for this author in PubMed Google Scholar
Dharavath Ramesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dharavath Ramesh .

Editor information

Editors and Affiliations

Dep. of Computer Science, VŠB – Technical Univ. of Ostrava, Ostrava, Czech Republic
Václav Snášel
(MIR Labs), Scientific Net Innov & Res Excel, Auburn, Washington, USA
Ajit Abraham
Faculty of Elec. Eng. & Comp. Sci., VŠB - Technical University of Ostrava, Ostrava-Poruba, Czech Republic
Pavel Krömer
Department of Paper Technology, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Millie Pant
Fac of Info & Comm, Comp Inte & Tech Lab, Universiti Teknikal Malaysia Melaka, Durian Tunggal, Malaysia
Azah Kamilah Muda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sethi, K.K., Ramesh, D. (2016). Delay Scheduling with Reduced Workload on JobTracker in Hadoop. In: Snášel, V., Abraham, A., Krömer, P., Pant, M., Muda, A. (eds) Innovations in Bio-Inspired Computing and Applications. Advances in Intelligent Systems and Computing, vol 424. Springer, Cham. https://doi.org/10.1007/978-3-319-28031-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-28031-8_32
Published: 15 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28030-1
Online ISBN: 978-3-319-28031-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics