An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

Tang, Zhuo; Liu, Min; Ammar, Almoalmi; Li, Kenli; Li, Keqin

doi:10.1007/s11227-014-1335-2

An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

Published: 19 November 2014

Volume 72, pages 2059–2079, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Zhuo Tang¹,
Min Liu¹,
Almoalmi Ammar¹,
Kenli Li¹ &
…
Keqin Li^1,2

787 Accesses
3 Altmetric
Explore all metrics

Abstract

The MapReduce framework is considered to be an effective resolution for huge and parallel data processing. This paper treats a massive data processing workflow as a DAG graph consisting of MapReduce jobs. In a heterogeneous computing environment, the computation speed can be different even on the same slot depending on various jobs. For this problem, this paper proposes an optimized MapReduce workflow scheduling algorithm. This algorithm comprises a job prioritizing phase and a task assignment phase. First, the jobs can be classified as I/O-intensive and computing-intensive, and the priorities of all jobs are computed according to their corresponding types. Then, the suitable slots are allocated for each block, and the MapReduce tasks in the workflow are scheduled with respect to data locality. The experimental results show that the optimized MapReduce workflow scheduling algorithm can improve the performance of task scheduling and the rationality of resources allocation in heterogeneous computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MapReduce scheduling algorithms in Hadoop: a systematic study

Article Open access 10 October 2023

HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduce

Article 28 October 2021

HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework

Article Open access 30 November 2019

References

http://www.hpc.ncep.noaa.gov/
Oozie. http://oozie.apache.org/
Barker A, Van Hemert J (2007) Scientific workflow: a survey and research directions. In: Proceedings of the 7th international conference on Parallel processing and applied mathematics, pp. 746–753. Springer
Barker A, Weissman JB, Hemert JI (2009) The circulate architecture: avoiding workflow bottlenecks caused by centralised orchestration. Clust Comput 12(2):221–235
Article Google Scholar
Barseghian D, Altintas I, Jones M, Crawl D, Potter N, Gallagher J, Cornillon P, Schildhauer M, Borer E, Seabloom E et al (2010) Workflows and extensions to the kepler scientific workflow system to support environmental sensor data access and analysis. Ecol Inform 5(1):42–50
Article Google Scholar
Calheiros R, Ranjan R, Beloglazov A, De Rose C, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw 41(1):23–50
Google Scholar
Chen Q, Wang L, Shang Z (2008) Mrgis: a mapreduce-enabled high performance workflow system for gis. In: eScience, 2008. eScience’08. IEEE Fourth International Conference on, IEEE, pp. 646–651
Craddock Tracy Harwood (2008) e.a.: e-science: relieving bottlenecks in large-scale genome analyses. Nature Publishing Group, pp. 948–954
Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation(OSDI), p. 137C150
Deelman E, Singh G, Su M, Blythe J, Gil Y, Kesselman C, Mehta G, Vahi K, Berriman G, Good J et al (2005) Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci Progr 13(3):219–237
Google Scholar
Fei X, Lu S (2012) A dataflow-based scientific workflow composition framework. Serv Comput IEEE Trans 5(1):45–58. doi:10.1109/TSC.2010.58
Article Google Scholar
Fei X, Lu S, Lin C (2009) A mapreduce-enabled scientific workflow composition framework. In: IEEE International Conference on Web Services, 2009. ICWS 2009, IEEE, pp. 663–670
Group K. Opencl (open computing language) - the open standard for parallel programming of heterogeneous systems. In: URL http://www.khronos.org/opencl/
Johnson D, Garey M (1979) Computers and intractability: a guide to the theory of np-completeness. Freeman&Co, San Francisco
MATH Google Scholar
Lander G, Stagg S, Voss N, Cheng A, Fellmann D, Pulokas J, Yoshioka C, Irving C, Mulder A, Lau P et al (2009) Appion: an integrated, database-driven pipeline to facilitate em image processing. J Struct Biol 166(1):95–102
Article Google Scholar
Lin C, Lu S, Lai Z, Chebotko A, Fei X, Hua J, Fotouhi F (2008) Service-oriented architecture for view: A visual scientific workflow management system. In: IEEE International Conference on Services Computing, 2008. SCC’08. IEEE, vol. 1, pp. 335–342
Ludäscher B, Weske M, Mcphillips T, Bowers S (2009) Scientific workflows: business as usual? Business process management pp. 31–47
McPhillips T, Bowers S, Zinn D, Ludäscher B (2009) Scientific workflow design for mere mortals. Futur Gener Comput Syst 25(5):541–551
Article Google Scholar
Nguyen P, Halem M (2011) A mapreduce workflow system for architecting scientific data intensive applications. In: Proceeding of the 2nd international workshop on Software engineering for cloud computing, ACM, pp. 57–63
Oinn T Greenwood M, e.a. (2005) Taverna:lessons in creating a workflow environment for the life sciences. pp. 1067–1100
Pireddu L, Leo S, Zanetti G (2011). Mapreducing a genomic sequencing workflow. In: Proceedings of the second international workshop on MapReduce and its applications, ACM, pp. 67–74
Polo J, Carrera D, Becerra Y, Beltran V, Torres J, Ayguadé E (2010) Performance management of accelerated mapreduce workloads in heterogeneous clusters. In: 39th International Conference on Parallel Processing (ICPP2010)
Polo J, Carrera D, Becerra Y, Steinder M, Whalley I (2010) Performance-driven task co-scheduling for mapreduce environments. In: Network Operations and Management Symposium (NOMS), 2010 IEEE, pp. 373–380
Rooijers K, Kolmeder C, Juste C, Doré J, de Been M, Boeren S, Galan P, Beauvallet C, de Vos W, Schaap P (2011) An iterative workflow for mining the human intestinal metaproteome. BMC Genomics 12(1):6
Article Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) 2010, pp. 1–10
Topcuouglu H, Hariri S, Wu MY (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Article Google Scholar
Wang J, Crawl D, Altintas I (2009) Kepler+ hadoop: A general architecture facilitating data-intensive applications in scientific workflow systems. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, ACM, p. 12
Warr W (2012) Scientific workflow systems: Pipeline pilot and knime. Journal of computer-aided molecular design pp. 1–4
White T (2012) Hadoop: The definitive guide. O’Reilly Media
Wolf J, Rajan D, Hildrum K, Khandekar R, Kumar V, Parekh S, Wu K, Balmin A (2010) Flex: a slot allocation scheduling optimizer for mapreduce workloads. Middleware 2010:1–20
Google Scholar
Jacob JC, Katz DS et. al (2004) The Montage architecture for gridenabled science processing of large, distributed datasets. In: Proceedings of the Earth Science Technology Conference, June 2004

Download references

Acknowledgments

The authors are grateful to the three anonymous reviewers for their criticism and comments which have helped to improve the presentation and quality of the paper. This work is supported by the Key Program of National Natural Science Foundation of China (Grant Nos. 61133005, 61432005) National Natural Science Foundation of China (Grant Nos. 61103047,61370095).

Author information

Authors and Affiliations

College of Information Science and Engineering, Hunan University, Changsha, 410082, China
Zhuo Tang, Min Liu, Almoalmi Ammar, Kenli Li & Keqin Li
Department of Computer Science, State University of New York, New Paltz, NY, 12561, USA
Keqin Li

Authors

Zhuo Tang
View author publications
You can also search for this author inPubMed Google Scholar
Min Liu
View author publications
You can also search for this author inPubMed Google Scholar
Almoalmi Ammar
View author publications
You can also search for this author inPubMed Google Scholar
Kenli Li
View author publications
You can also search for this author inPubMed Google Scholar
Keqin Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhuo Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, Z., Liu, M., Ammar, A. et al. An optimized MapReduce workflow scheduling algorithm for heterogeneous computing. J Supercomput 72, 2059–2079 (2016). https://doi.org/10.1007/s11227-014-1335-2

Download citation

Published: 19 November 2014
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11227-014-1335-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MapReduce scheduling algorithms in Hadoop: a systematic study

HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduce

HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now