Abstract
With the development of cloud computing, more and more data-intensive workflows have been deployed on virtualized datacenters. As a result, the energy spent on massive data accessing grows rapidly. In this paper, an energy aware scheduling algorithm is proposed, which introduces a novel heuristic called Minimal Data-Accessing Energy Path for scheduling data-intensive workflows aiming to reduce the energy consumption of intensive data accessing. Extensive experiments based on both synthetical and real workloads are conducted to investigate the effectiveness and performance of the proposed scheduling approach. The experimental results show that the proposed heuristic scheduling can significantly reduce the energy consumption of storing/retrieving intermediate data generated during the execution of data intensive workflow. In addition, it exhibits better robustness than existing algorithms when cloud systems are in presence of I/O intensive workloads.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sun D W, Chang G R, Gao S, Jin L Z, Wang X W. Modeling a dynamic data replication strategy to increase system availability in cloud computing environments. Journal of Computer Science and Technology, 2012, 27(2): 256–272.
Sedaghat M, Hernández F, Elmroth E. Unifying cloud management: Towards overall governance of business level objectives. In Proc. the 11th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, May 2011, pp.591-597.
Iosup A, Yigitbasi N, Epema D. On the performance variability of production cloud services. In Proc. the 11th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, May 2011, pp.104-113.
Mahadevan P, Banerjee S, Sharma P, Shah A, Ranganathan P. On energy efficiency for enterprise and data center networks. IEEE Communications Magazine, 2011, 49(8): 94-100.
Goth G. Data center operators face energy irony. IEEE Internet Computing, 2010, 14(2): 7–10.
Wang J, Feng L, Xue W, Song Z. A survey on energy-efficient data management. SIGMOD Record, 2011, 40(2): 17–23.
Figueiredo J, Maciel P, Callou G, Tavares E, Sousa E, Silva B. Estimating reliability importance and total cost of acquisition for data center power infrastructures. In Proc. the IEEE Int. Conf. Systems, Man, and Cybernetics, Oct. 2011, pp.421-426.
Li J X, Li B, Wo T Y, Hu C M, Huai J P, Liu L, Lam K P. CyberGuarder: A virtualization security assurance architecture for green cloud computing. Future Generation ComputerSystems, 2012, 28(2): 379–390.
Garg S K, Yeob C S, Anandasivamc A, Buyyaa R. Environment-conscious scheduling of HPC applications on distributed cloud-oriented data centers. Journal of Parallel Distributed Computing, 2011, 71(6): 732–749.
Juve G, Deelman E, Berriman G B, Berman B P, Maechling P. An evaluation of the cost and performance of scientific workflows on Amazon EC2. Journal of Grid Computing, 2012, 10(1): 5–21.
Yuan D, Yang Y, Liu X, Zhang G, Chen J. A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurrency and Computation: Practice and Experience, 2012, 24(9): 956–976.
Tolosana-Calasanza R, Bañares J A, Pham C, Rana O F. Enforcing QoS in scientific workflow systems enacted over cloud infrastructures. Journal of Computer and System Sciences, 2012, 78(5): 1300–1315.
Sotomayor B, Montero R S, Llorente I M, Foster I. Virtual infrastructure management in private and hybrid clouds. IEEE Internet Computing, 2009, 13(5): 14–22.
Chapman C, Emmerich W, Márquez F G, Clayman S, Galis A. Software architecture definition for on-demand cloud provisioning. Cluster Computing, 2012, 15(2): 79–100.
Kirschnick J, Alcaraz-Calero J M, Goldsack P, Farrell A, Guijarro J, Loughran S, Edwards N, Wilcock L. Towards an architecture for deploying elastic services in the cloud. Software: Practice and Experience, 2012, 42(4): 395–408.
Cherkasova L, Gupta D, Vahdat A. Comparison of the three CPU schedulers in Xen. ACM SIGMETRICS Performance Evaluation Review, 2007, 35(2): 42–51.
Krishnan B, Amur H, Gavrilovska A, Schwan K. VM power metering: Feasibility and challenges. ACM SIGMETRICS Performance Evaluation Review, 2010, 38(3): 56–60.
Kang H, Chen Y, Wong J L, Radu S, Wu J. Enhancement of Xen’s scheduler for MapReduce workloads. In Proc. the 20th Int. Symp. High Performance Distributed Computing, June 2011, pp.251-262.
Kim H, Lim H, Jeong J, Jo H, Lee J. Task-aware virtual machine scheduling for I/O performance. In Proc. the 2009 ACM SIGPLAN/SIGOPS Int. Conf. Virtual Execution, March 2009, pp.101-110.
Abbasi Z, Varsamopoulos G, Gupta S K S. TACOMA: Server and workload management in Internet data centers considering cooling-computing power trade-off and energy proportionality. ACM Transactions on Architecture and Code Optimization, 2012, 9(2): Article No.11.
Fang W, Liang X, Sun Y, Vasilakos A V. Network element scheduling for achieving energy-aware data center networks. International Journal of Computers Communications and Control, 2012, 7(2):241–251.
Benoit A, Goud P R, Robert Y. Performance and energy optimization of concurrent pipelined applications. In Proc. the 24th IEEE Int. Symp. Parallel and Distributed Processing, Apr 2010, pp.1-12.
Baskiyar S, Abdel-Kader R. Energy aware DAG scheduling on heterogeneous systems. Cluster Computing, 2010, 13(4): 373–383.
Rizvandi N B, Taheri J, Zomaya A Y, Lee Y C. Linear combinations of DVFs-enabled processor frequencies to modify the energy-aware scheduling algorithms. In Proc. the 10th IEEE/ACM Int. Conf. Cluster, Cloud and Grid Computing, May 2010, pp.388-397.
Lee Y C, Zomaya A Y. Energy conscious scheduling for distributed computing systems under different operating conditions. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(8): 1374–1381.
Mezmaza M, Melab N, Kessaci Y, Lee Y C, Talbi E G, Zomaya A Y, Tuyttens D. A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems. Journal of Parallel and Distributed Computing, 2011, 71(11): 1497–1508.
Zhu D, Melhem R, Childers B R. Scheduling with dynamic voltage/speed adjustment using slack reclamation in multi processor real-time systems. IEEE Transactions on Parallel and Distributed Systems, 2003, 14(7): 686–700.
Zong Z, Briggs M, Connor N, Xiao Q. An energy-efficient framework for large-scale parallel storage systems. In Proc. the 21st IEEE Int. Symp. Parallel and Distributed Processing, Mar. 2007, pp.1-7.
Manzanares A, Bellam K, Qin X. A prefetching scheme for energy conservation in parallel disk systems. In Proc. the 22nd IEEE Int. Symp. Parallel and Distributed Processing, Apr. 2008, pp.1-5.
Bohra A, Chaudhary V. Vmeter: Power modelling for virtualized clouds. In Proc. the 24th IEEE Int. Symp. Parallel and Distributed Processing, Apr. 2010, pp.1-8.
Cho S, Melhem R G. On the interplay of parallelization, program performance, and energy consumption. IEEE Transactions on Parallel and Distributed Systems, 2010, 21(3): 342-353.
Kim K H, Beloglazov A, Buyya R. Power-aware provisioning of virtual machines for real-time cloud services. Concurrency and Computation: Practice and Experience, 2011, 23(13):1491–1505.
Speitkamp B, Bichler M. A mathematical programming approach for server consolidation problems in virtualized data centers. IEEE Transactions on Services Computing, 2010, 3(4): 266–278.
Hupfeld F, Cortes T, Kolbeck B, Stender J, Focht E, Hess M, Malo J, Martí J, Cesario E. The XtreemFS architecture — A case for object-based file systems in grids. Concurrency and Computation: Practice and Experience, 2008, 20(17): 2049-2060.
Topcuoglu H, Hariri S, Wu M Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(3): 260–274.
Calheiros R N, Ranjan R, Beloglazov A, De Rose C A F, Buyya R. CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 2011, 41(1): 23–50.
Berlinska J, Drozdowski M. Scheduling divisible MapReduce computations. Journal of Parallel and Distributed Computing, 2011, 71(3): 450–459.
Kiss T, Greenwell P, Heindl H, Terstyánszky G, Weingarten N. Parameter sweep workflows for modelling carbohydrate recognition. Journal of Grid Computing, 2010, 8(4): 587-601.
Kansal A, Zhao F, Liu J, Kothari N, Bhattacharya A A. Virtual machine power metering and provisioning. In Proc. the 1st ACM Symp. Cloud Computing, June 2010, pp.39-50.
Theiner D, Wieczorek M. Reduction of calibration time of distributed hydrological models by use of grid computing and nonlinear optimisation algorithms. In Proc. the 7th Int. Conf. Hydroinformatics, Sept. 2006.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China under Grant Nos. 60970038, 61272148, the Science and Technology Plan Project of Hunan Province of China under Grant No. 2012GK3075, and the Scientific Research Fund of Hunan Provincial Education Department of China under Grant No. 13B015.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Xiao, P., Hu, ZG. & Zhang, YP. An Energy-Aware Heuristic Scheduling for Data-Intensive Workflows in Virtualized Datacenters. J. Comput. Sci. Technol. 28, 948–961 (2013). https://doi.org/10.1007/s11390-013-1390-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-013-1390-9