Abstract
With the exploding of data-intensive web applications and requests (tasks), geo-distributed and large-scale data centers (DCs) are widely deployed in Software as a Service (SaaS) cloud, but server failures continue to grow at the same time. In this context, task scheduling problems become more intricate and both scheduling quality and scheduling speed raise further concerns. In this paper, we first propose a virtualized & monitoring SaaS model with predictive maintenance to minimize the costs of fault tolerance. Then with the monitored and predicted available states of servers, we focus on dynamic real-time task scheduling in geo-distributed and large-scale DCs with heterogeneous servers. Multiple objectives, including the long-term performance benefits, energy and communication costs, are taken into consideration in order to improve scheduling quality. For inter-DC and intra-DC task scheduling, two dynamic programming problems are formulated respectively, but there exists the problem that both state and action spaces are too large to be solved by simple iterations. To address this issue, we introduce the idea of reinforcement learning theory into solving traditional stochastic dynamic programming problems in the large-scale SaaS cloud, and put forward a cascaded two-level (inter-DC and intra-DC level) approximate dynamic programming (ADP) task-scheduling algorithm. The computation complexity can be significantly reduced and scheduling speed can be greatly improved. Finally, we conduct experiments with both random simulation data and Google cloud trace-logs. QoS evaluations and comparisons demonstrate that two ADP algorithms can work cooperatively, and our two-level ADP algorithm is more effective under large quantity of bursty requests.
Similar content being viewed by others
References
Alahmadi, A., Che, D., Khaleel, M., Zhu, M.M., Ghodous, P.: An Innovative Energy-Aware Cloud Task Scheduling Framework. In: 2015 IEEE 8Th International Conference on Cloud Computing, pp. 493–500. IEEE (2015)
Barroso, L.A., Clidaras, J., Hölzle, U.: The datacenter as a computer: an introduction to the design of warehouse-scale machines. Synth. Lect. Comput. Archit. 8(3), 1–154 (2013)
Benson, T., Anand, A., Akella, A., Zhang, M.: Understanding Data Center Traffic Characteristics. In: ACM Workshop on Research on Enterprise NETWORKING, pp. 65–72 (2009)
Cao, Z., Dong, S.: Energy-Aware Framework for Virtual Machine Consolidation in Cloud Computing. In: IEEE International Conference on High PERFORMANCE Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp. 1890–1895 (2013)
Chen, W., Paik, I., Li, Z.: Cost-aware streaming workflow allocation on geo-distributed data centers. IEEE Trans. Comput. 66(2), 256–271 (2017)
Chen, Y., Lin, C., Huang, J., Shen, X.: Cost-Effective Request Scheduling for Greening Cloud Data Centers. In: IEEE International Conference on Services Computing, pp. 50–57 (2016)
Cheng, C., Li, J., Wang, Y.: An energy-saving task scheduling strategy based on vacation queuing theory in cloud computing. Tsinghua Sci. Technol. 20(1), 28–39 (2015)
Ding, Z., Yang, B., Güting, R. H., Li, Y.: Network-matched trajectory-based moving-object database: Models and applications. IEEE Trans. Intell. Transp. Syst. 16 (4), 1918–1928 (2015)
Ding, Z., Yang, B., Chi, Y., Guo, L.: Enabling smart transportation systems: a parallel spatio-temporal database approach. IEEE Trans. Comput. 65(5), 1377–1391 (2016)
Egwutuoha, I.P., Cheny, S., Levy, D., Selic, B., Calvo, R.: Energy Efficient Fault Tolerance for High Performance Computing (Hpc) in the Cloud. In: 2013 IEEE Sixth International Conference on Cloud Computing, pp. 762–769. IEEE (2013)
Fan, X., Weber, W.D., Barroso, L.A.: Power Provisioning for a Warehouse-Sized Computer. In: ACM SIGARCH Computer Architecture News, vol. 35, pp. 13–23. IEEE (2007)
Google: Cloud trace-logs. code.google.com/p/googleclusterdata/wiki
Guo, C., Yang, B., Andersen, O., Jensen, C.S.: Ecosky: Reducing Vehicular Environmental Impact through Eco-Routing. In: IEEE International Conference on Data Engineering (2015)
Ho, Y.C., Zhao, Q.C., Jia, Q.S.: Ordinal Optimization: Soft Optimization for Hard Problems. Springer Publishing Company, Incorporated (2010)
Hosseinimotlagh, S., Khunjush, F., Hosseinimotlagh, S.: A Cooperative Two-Tier Energy-Aware Scheduling for Real-Time Tasks in Computing Clouds. In: 2014 22Nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 178–182. IEEE (2014)
Hu, J., Yang, B., Guo, C., Jensen, C.S.: Risk-aware path selection with time-varying, uncertain travel costs: a time series approach. Vldb J. 27(2), 179–200 (2018)
IBM: Predictive maintenance: http://www-01.ibm.com/software/analytics/solutions/operational-analytics/predictive-maintenance/ (2015)
Kumar, A., Shang, L., Peh, L.S., Jha, N.K.: System-level dynamic thermal management for high-performance microprocessors. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 27(1), 96–108 (2008)
Liu, F., Zhou, Z., Jin, H., Li, B., Li, B., Jiang, H.: On arbitrating the power-performance tradeoff in saas clouds. IEEE Trans. Parallel Distrib. Syst. 25 (10), 2648–2658 (2014)
Maguluri, S.T., Srikant, R., Ying, L.: Stochastic models of load balancing and scheduling in cloud computing clusters. In: INFOCOM, 2012 Proceedings IEEE, pp. 702–710 (2015)
Mao, Y., Xu, Z., Ping, P., Wang, L.: Delay-Aware Associate Tasks Scheduling in the Cloud Computing. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing (BDCloud), pp. 104–109. IEEE (2015)
Nakamura, H., Matsuda, H., Akazawa, F., Shiraga, M.: Network monitor and control apparatus (2012). US Patent 8,195,985
O’Brien, J.: Datacenter facilities maintenance. http://www.datacenterjournal.com/datacenter-facilities-maintenance-time-change-culture (2014)
Peterson, L.L., Davie, B.S.: Computer networks: a systems approach. Elsevier, New York (2007)
Powell, W.B.: Approximate Dynamic Programming: Solving the curses of dimensionality, vol. 703. Wiley (2007)
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. Wiley, New York (2014)
Schroeder, B., Gibson, G.: A large-scale study of failures in high-performance computing systems. IEEE Trans. Dependable Secure Comput. 7(4), 337–350 (2010)
Shang, S., Chen, L., Jensen, C.S., Wen, J.R., Kalnis, P.: Searching trajectories by regions of interest. IEEE Trans. Knowl. Data Eng. 29(7), 1549–1562 (2017)
Shang, S., Ding, R., Zheng, K., Jensen, C.S., Kalnis, P., Zhou, X.: Personalized trajectory matching in spatial networks. Vldb J. 23(3), 449–468 (2014)
Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Trajectory similarity join in spatial networks. Proc. Vldb Endowment 10(11), 1178–1189 (2017)
Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User Oriented Trajectory Search for Trip Recommendation. In: EDBT, pp. 156–167 (2012)
Tchana, A., Broto, L., Hagimont, D.: Approaches to Cloud Computing Fault Tolerance. In: 2012 International Conference on Computer, Information and Telecommunication Systems (CITS), pp. 1–6. IEEE (2012)
Wang, J., Bao, W., Zhu, X., Yang, L.T., Xiang, Y.: Festal: fault-tolerant elastic scheduling algorithm for real-time tasks in virtualized clouds. IEEE Trans. Comput. 64(9), 2545–2558 (2015)
Wikipedia: Lm-sensors. en.wikipedia.org/wiki/Lm_sensors
Xiang, X., Lin, C., Chen, F., Chen, X.: Greening Geo-Distributed Data Centers by Joint Optimization of Request Routing and Virtual Machine Scheduling. In: Ieee/Acm International Conference on Utility and Cloud Computing, pp. 1–10 (2015)
Yang, B., Guo, C., Jensen, C.S., Kaul, M., Shang, S.: Stochastic Skyline Route Planning under Time-Varying Uncertainty. In: IEEE International Conference on Data Engineering (2014)
Yao, Y., Huang, L., Sharma, A., Golubchik, L.: Data centers power reduction: a two time scale approach for delay tolerant workloads. In: INFOCOM, 2012 Proceedings IEEE, pp. 1431–1439 (2012)
Ying, C., Huang, J., Lin, C., Jie, H.: A partial selection methodology for efficient qos-aware service composition. IEEE Trans. Serv. Comput. 8(3), 384–397 (2015)
Zhang, Q., Zhu, Q., Zhani, M.F., Boutaba, R.: Dynamic Service Placement in Geographically Distributed Clouds. In: IEEE International Conference on Distributed Computing Systems, pp. 526–535 (2012)
Zhang, P., Lin, C., Ma, X., Ren, F., Li, W.: Monitoring-Based Task Scheduling in Large-Scale Saas Cloud. In: International Conference on Service-Oriented Computing, pp. 140–156 (2016)
Zhu, X., Yang, L.T., Chen, H., Wang, J., Yin, S., Liu, X.: Real-time tasks oriented energy-aware scheduling in virtualized clouds. IEEE Trans. Cloud Comput. 2 (2), 168–180 (2014)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61472199 and No. 61370132).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, P., Ma, X., Xiao, Y. et al. Two-level task scheduling with multi-objectives in geo-distributed and large-scale SaaS cloud. World Wide Web 22, 2291–2319 (2019). https://doi.org/10.1007/s11280-019-00680-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-019-00680-2