Two-level task scheduling with multi-objectives in geo-distributed and large-scale SaaS cloud

Zhang, Puheng; Ma, Xiao; Xiao, Yanping; Li, Wenzhuo; Lin, Chuang

doi:10.1007/s11280-019-00680-2

Two-level task scheduling with multi-objectives in geo-distributed and large-scale SaaS cloud

Published: 10 April 2019

Volume 22, pages 2291–2319, (2019)
Cite this article

World Wide Web Aims and scope Submit manuscript

Puheng Zhang ORCID: orcid.org/0000-0001-6773-5034¹,
Xiao Ma²,
Yanping Xiao¹,
Wenzhuo Li² &
…
Chuang Lin²

422 Accesses
7 Citations
Explore all metrics

Abstract

With the exploding of data-intensive web applications and requests (tasks), geo-distributed and large-scale data centers (DCs) are widely deployed in Software as a Service (SaaS) cloud, but server failures continue to grow at the same time. In this context, task scheduling problems become more intricate and both scheduling quality and scheduling speed raise further concerns. In this paper, we first propose a virtualized & monitoring SaaS model with predictive maintenance to minimize the costs of fault tolerance. Then with the monitored and predicted available states of servers, we focus on dynamic real-time task scheduling in geo-distributed and large-scale DCs with heterogeneous servers. Multiple objectives, including the long-term performance benefits, energy and communication costs, are taken into consideration in order to improve scheduling quality. For inter-DC and intra-DC task scheduling, two dynamic programming problems are formulated respectively, but there exists the problem that both state and action spaces are too large to be solved by simple iterations. To address this issue, we introduce the idea of reinforcement learning theory into solving traditional stochastic dynamic programming problems in the large-scale SaaS cloud, and put forward a cascaded two-level (inter-DC and intra-DC level) approximate dynamic programming (ADP) task-scheduling algorithm. The computation complexity can be significantly reduced and scheduling speed can be greatly improved. Finally, we conduct experiments with both random simulation data and Google cloud trace-logs. QoS evaluations and comparisons demonstrate that two ADP algorithms can work cooperatively, and our two-level ADP algorithm is more effective under large quantity of bursty requests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

Review of job shop scheduling research and its new perspectives under Industry 4.0

Article 21 August 2017

Dynamic resource allocation in cloud computing: analysis and taxonomies

Article 28 January 2022

References

Alahmadi, A., Che, D., Khaleel, M., Zhu, M.M., Ghodous, P.: An Innovative Energy-Aware Cloud Task Scheduling Framework. In: 2015 IEEE 8Th International Conference on Cloud Computing, pp. 493–500. IEEE (2015)
Barroso, L.A., Clidaras, J., Hölzle, U.: The datacenter as a computer: an introduction to the design of warehouse-scale machines. Synth. Lect. Comput. Archit. 8(3), 1–154 (2013)
Article Google Scholar
Benson, T., Anand, A., Akella, A., Zhang, M.: Understanding Data Center Traffic Characteristics. In: ACM Workshop on Research on Enterprise NETWORKING, pp. 65–72 (2009)
Cao, Z., Dong, S.: Energy-Aware Framework for Virtual Machine Consolidation in Cloud Computing. In: IEEE International Conference on High PERFORMANCE Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp. 1890–1895 (2013)
Chen, W., Paik, I., Li, Z.: Cost-aware streaming workflow allocation on geo-distributed data centers. IEEE Trans. Comput. 66(2), 256–271 (2017)
MathSciNet MATH Google Scholar
Chen, Y., Lin, C., Huang, J., Shen, X.: Cost-Effective Request Scheduling for Greening Cloud Data Centers. In: IEEE International Conference on Services Computing, pp. 50–57 (2016)
Cheng, C., Li, J., Wang, Y.: An energy-saving task scheduling strategy based on vacation queuing theory in cloud computing. Tsinghua Sci. Technol. 20(1), 28–39 (2015)
Article MathSciNet Google Scholar
Ding, Z., Yang, B., Güting, R. H., Li, Y.: Network-matched trajectory-based moving-object database: Models and applications. IEEE Trans. Intell. Transp. Syst. 16 (4), 1918–1928 (2015)
Article Google Scholar
Ding, Z., Yang, B., Chi, Y., Guo, L.: Enabling smart transportation systems: a parallel spatio-temporal database approach. IEEE Trans. Comput. 65(5), 1377–1391 (2016)
Article MathSciNet Google Scholar
Egwutuoha, I.P., Cheny, S., Levy, D., Selic, B., Calvo, R.: Energy Efficient Fault Tolerance for High Performance Computing (Hpc) in the Cloud. In: 2013 IEEE Sixth International Conference on Cloud Computing, pp. 762–769. IEEE (2013)
Fan, X., Weber, W.D., Barroso, L.A.: Power Provisioning for a Warehouse-Sized Computer. In: ACM SIGARCH Computer Architecture News, vol. 35, pp. 13–23. IEEE (2007)
Google: Cloud trace-logs. code.google.com/p/googleclusterdata/wiki
Guo, C., Yang, B., Andersen, O., Jensen, C.S.: Ecosky: Reducing Vehicular Environmental Impact through Eco-Routing. In: IEEE International Conference on Data Engineering (2015)
Ho, Y.C., Zhao, Q.C., Jia, Q.S.: Ordinal Optimization: Soft Optimization for Hard Problems. Springer Publishing Company, Incorporated (2010)
Hosseinimotlagh, S., Khunjush, F., Hosseinimotlagh, S.: A Cooperative Two-Tier Energy-Aware Scheduling for Real-Time Tasks in Computing Clouds. In: 2014 22Nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 178–182. IEEE (2014)
Hu, J., Yang, B., Guo, C., Jensen, C.S.: Risk-aware path selection with time-varying, uncertain travel costs: a time series approach. Vldb J. 27(2), 179–200 (2018)
Article Google Scholar
IBM: Predictive maintenance: http://www-01.ibm.com/software/analytics/solutions/operational-analytics/predictive-maintenance/ (2015)
Kumar, A., Shang, L., Peh, L.S., Jha, N.K.: System-level dynamic thermal management for high-performance microprocessors. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 27(1), 96–108 (2008)
Article Google Scholar
Liu, F., Zhou, Z., Jin, H., Li, B., Li, B., Jiang, H.: On arbitrating the power-performance tradeoff in saas clouds. IEEE Trans. Parallel Distrib. Syst. 25 (10), 2648–2658 (2014)
Article Google Scholar
Maguluri, S.T., Srikant, R., Ying, L.: Stochastic models of load balancing and scheduling in cloud computing clusters. In: INFOCOM, 2012 Proceedings IEEE, pp. 702–710 (2015)
Mao, Y., Xu, Z., Ping, P., Wang, L.: Delay-Aware Associate Tasks Scheduling in the Cloud Computing. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing (BDCloud), pp. 104–109. IEEE (2015)
Nakamura, H., Matsuda, H., Akazawa, F., Shiraga, M.: Network monitor and control apparatus (2012). US Patent 8,195,985
O’Brien, J.: Datacenter facilities maintenance. http://www.datacenterjournal.com/datacenter-facilities-maintenance-time-change-culture (2014)
Peterson, L.L., Davie, B.S.: Computer networks: a systems approach. Elsevier, New York (2007)
MATH Google Scholar
Powell, W.B.: Approximate Dynamic Programming: Solving the curses of dimensionality, vol. 703. Wiley (2007)
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. Wiley, New York (2014)
Schroeder, B., Gibson, G.: A large-scale study of failures in high-performance computing systems. IEEE Trans. Dependable Secure Comput. 7(4), 337–350 (2010)
Article Google Scholar
Shang, S., Chen, L., Jensen, C.S., Wen, J.R., Kalnis, P.: Searching trajectories by regions of interest. IEEE Trans. Knowl. Data Eng. 29(7), 1549–1562 (2017)
Article Google Scholar
Shang, S., Ding, R., Zheng, K., Jensen, C.S., Kalnis, P., Zhou, X.: Personalized trajectory matching in spatial networks. Vldb J. 23(3), 449–468 (2014)
Article Google Scholar
Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Trajectory similarity join in spatial networks. Proc. Vldb Endowment 10(11), 1178–1189 (2017)
Article Google Scholar
Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User Oriented Trajectory Search for Trip Recommendation. In: EDBT, pp. 156–167 (2012)
Tchana, A., Broto, L., Hagimont, D.: Approaches to Cloud Computing Fault Tolerance. In: 2012 International Conference on Computer, Information and Telecommunication Systems (CITS), pp. 1–6. IEEE (2012)
Wang, J., Bao, W., Zhu, X., Yang, L.T., Xiang, Y.: Festal: fault-tolerant elastic scheduling algorithm for real-time tasks in virtualized clouds. IEEE Trans. Comput. 64(9), 2545–2558 (2015)
Article MathSciNet Google Scholar
Wikipedia: Lm-sensors. en.wikipedia.org/wiki/Lm_sensors
Xiang, X., Lin, C., Chen, F., Chen, X.: Greening Geo-Distributed Data Centers by Joint Optimization of Request Routing and Virtual Machine Scheduling. In: Ieee/Acm International Conference on Utility and Cloud Computing, pp. 1–10 (2015)
Yang, B., Guo, C., Jensen, C.S., Kaul, M., Shang, S.: Stochastic Skyline Route Planning under Time-Varying Uncertainty. In: IEEE International Conference on Data Engineering (2014)
Yao, Y., Huang, L., Sharma, A., Golubchik, L.: Data centers power reduction: a two time scale approach for delay tolerant workloads. In: INFOCOM, 2012 Proceedings IEEE, pp. 1431–1439 (2012)
Ying, C., Huang, J., Lin, C., Jie, H.: A partial selection methodology for efficient qos-aware service composition. IEEE Trans. Serv. Comput. 8(3), 384–397 (2015)
Article Google Scholar
Zhang, Q., Zhu, Q., Zhani, M.F., Boutaba, R.: Dynamic Service Placement in Geographically Distributed Clouds. In: IEEE International Conference on Distributed Computing Systems, pp. 526–535 (2012)
Zhang, P., Lin, C., Ma, X., Ren, F., Li, W.: Monitoring-Based Task Scheduling in Large-Scale Saas Cloud. In: International Conference on Service-Oriented Computing, pp. 140–156 (2016)
Chapter Google Scholar
Zhu, X., Yang, L.T., Chen, H., Wang, J., Yin, S., Liu, X.: Real-time tasks oriented energy-aware scheduling in virtualized clouds. IEEE Trans. Cloud Comput. 2 (2), 168–180 (2014)
Article Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61472199 and No. 61370132).

Author information

Authors and Affiliations

Logistical Research Institute of Science and Technology, Beijing, China
Puheng Zhang & Yanping Xiao
Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China
Xiao Ma, Wenzhuo Li & Chuang Lin

Authors

Puheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yanping Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhuo Li
View author publications
You can also search for this author in PubMed Google Scholar
Chuang Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Puheng Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, P., Ma, X., Xiao, Y. et al. Two-level task scheduling with multi-objectives in geo-distributed and large-scale SaaS cloud. World Wide Web 22, 2291–2319 (2019). https://doi.org/10.1007/s11280-019-00680-2

Download citation

Received: 18 December 2017
Revised: 15 February 2019
Accepted: 28 March 2019
Published: 10 April 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s11280-019-00680-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-level task scheduling with multi-objectives in geo-distributed and large-scale SaaS cloud

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Review of job shop scheduling research and its new perspectives under Industry 4.0

Dynamic resource allocation in cloud computing: analysis and taxonomies

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-level task scheduling with multi-objectives in geo-distributed and large-scale SaaS cloud

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Review of job shop scheduling research and its new perspectives under Industry 4.0

Dynamic resource allocation in cloud computing: analysis and taxonomies

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation