Skip to main content
Log in

HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Large-scale MapReduce clusters that routinely process big data bring challenges to the cloud computing. One of the key challenges is to reduce the response time of these MapReduce clusters by minimizing their makespans. It is observed that the order in which these jobs are executed can have a significant impact on their overall makespans and resource utilization. In this work, we consider a scheduling model for multiple MapReduce jobs. The goal is to design a job scheduler that minimizes the makespan of such a set of MapReduce jobs. We exploit classical Johnson model and propose a novel framework HScheduler, which combines features of both classical Johnson’s algorithm and MapReduce to minimize the makespan for both offline and online jobs. Our Offline HScheduler reaches the theoretical lower bound (optimum) and Online HScheduler is 2-competitive which is the best-known constant ratio for minimizing the makespan. Through extensive real data tests, we find that HScheduler has better performance than the best-known approach by 10.6–11.7 % on average for offline scheduling and 8–10 % on average for online scheduling. The HScheduler can be applied to improve responsive time, throughput and energy efficiency in cloud computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Verma A, Cherkasova L, Campbell RH (2013) Orchestrating an ensemble of MapReduce jobs for minimizing their makespan. IEEE Trans Depend Secure Comput (online version)

  2. Capacity Scheduler Guide. Available http://hadoop.apache.org/common/docs/r0.20.1/capacity~scheduler.html

  3. Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster schedul- ing. In: Proceeding of EuroSystem, ACM, pp 265–278

  4. Wolf J et al (2010) FLEX: a slot allocation scheduling optimizer for MapReduce Workloads. In: ACM/IFIP/USENIX international middleware conference, Lecture Notes in Computer Science, vol 6452, pp 1–20

  5. Verma A, Cherkasova L, Campbell RH (2011) ARIA: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the ICAC, Germany, pp 235–244

  6. Verma A, Cherkasova L, Campbell RH (2011) Play it again, SimMR! In: Proceedings of the international IEEE Cluster’2011, IEEE Computer Society Washington, DC, USA, pp 253–261

  7. Zhu Y, Jiang Y, Wu W, Ding L, Teredesai A, Li D, Lee W (2014) Minimizing makespan and total completion time in MapReduce-like systems. In: Proceedings of INFOCOM, Toronto, ON, pp 2166–2174, 27 April 2014–2 May 2014

  8. Herodotou H, Babu S (2011) Profiling, what-if analysis, and cost based optimization of MapReduce programs. In: Proceedings of the VLDB Endowment 4(11):1111–1122

  9. Moseley B, Dasgupta A, Kumar R, Sarl T (2011) On scheduling in map-reduce and flow-shops. In: Proceedings of SPAA, ACM New York, NY, pp 289–298

  10. Verma A, Cherkasova L, Campbell RH (2012) Two sides of a coin: optimizing the schedule of MapReduce jobs to minimize their makespan and improve cluster performance. In: MASCOTS, IEEE Computer Society, pp 11–18

  11. Zheng Y, Shroff NB, Sinha P (2013) A new analytical technique for designing provably efficient MapReduce schedulers. In: The Proceedings of INFOCOM, Turin, pp 1600–1608, 14–19 April 2013

  12. http://sortbenchmark.org/YahooHadoop.pdf

  13. Johnson S (1954) Optimal two-and three-stage production schedules with setup times included. Naval Res Log Q

  14. Wordcount http://www.cs.cornell.edu/home/llee/data/simple/

  15. Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness. WH Freeman & Co, New York

    MATH  Google Scholar 

Download references

Acknowledgments

This research is partially supported by China National Science Foundation (CNSF) with project ID 61450110440 and Sichuan Province Technology Plan (ID 2016GZ0322); Chongqing Research Program of Basic Research and Frontier Technology (ID cstc2015jcyjB0244). Prof. Wenhong Tian finished most of this work when he was a visiting fellow at CLOUDS lab led by Prof. Rajkumar Buyya at the University of Melbourne, Australia. The author thanks team members in CLOUDS for their comments to polish the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenhong Tian.

Additional information

This research is sponsored by the Natural Science Foundation of China (NSFC) Grant 61450110440.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, W., Li, G., Yang, W. et al. HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs. J Supercomput 72, 2376–2393 (2016). https://doi.org/10.1007/s11227-016-1737-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1737-4

Keywords

Navigation