Abstract
With a large number of heterogeneous processors are deployed on service-oriented cloud computing systems, the issue of processor random hardware failure is becoming increasingly prominent. Replication-based fault-tolerance task assignment is a common approach to satisfy application’s reliability requirement. However, the state-of-the-art algorithms have either high redundancy or low time efficiency. In this work, we propose a fast task assignment for minimizing redundancy (FTAMR) algorithm to satisfy reliability requirement for a directed acyclic graph-based parallel application on heterogeneous service-oriented cloud computing systems. Firstly, the FTAMR algorithm fast identifies tasks which need to be replicated. Secondly, the FTAMR algorithm fast maps selected tasks to their respective most suitable processors. Then, the FTAMR algorithm repeats above steps until application’s reliability satisfies established reliability requirement. Experimental results on real and synthetic generated parallel applications at different scales, parallelism, and heterogeneity show that the FTAMR algorithm can generate minimum redundancy and maximum time efficiency compared with the state-of-the-art fault-tolerance algorithms.
Similar content being viewed by others
References
Cai Z, Li X, Gupta JND (2016) Heuristics for provisioning services to workflows in xaas clouds. IEEE Trans Serv Comput 9(2):250–263
Zhou A, Wang S, Cheng B, Zheng Z, Yang F, Chang RN, Lyu MR, Buyya R (2017) Cloud service reliability enhancement via virtual machine placement optimization. IEEE Trans Serv Comput 10(6):902–913
Fu Z, Huang F, Sun X, Vasilakos AV, Yang C (2019) Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans Serv Comput 12(5):813–823
Xia Z, Wang X, Sun X, Wang Q (2016) A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans Parallel Distrib Syst 27(2):340–352
Kong Y, Zhang M, Ye D (2017) A belief propagation-based method for task allocation in open and dynamic cloud environments. Knowl Based Syst 115:123–132
Xie G, Zeng G, Chen Y, Bai Y, Zhou Z, Li R, Li K (2018) Minimizing redundancy to satisfy reliability requirement for a parallel application on heterogeneous service-oriented systems. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2017.2665552
Tang Z, Qi L, Cheng Z, Li K, Khan SU, Li K (2016) An energy-efficient task scheduling algorithm in dvfs-enabled cloud environment. J Grid Comput 14(1):55–74
Topcuoglu H, Hariri S, Wu MY (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Khan MA (2012) Scheduling for heterogeneous systems using constrained critical paths. Parallel Comput 38(4–5):175–193
Xie G, Li R, Li K (2015) Heterogeneity-driven end-to-end synchronized scheduling for precedence constrained tasks and messages on networked embedded systems. J Parallel Distrib Comput 83:1–12
Bosilca G, Bouteiller A, Danalis A, Herault T, Lemarinier P, Dongarra J (2012) Dague: a generic distributed dag engine for high performance computing. Parallel Comput 38(1):37–51 (Extensions for next-generation parallel programming models)
Leu J, Chen C, Hsu K (2014) Improving heterogeneous soa-based iot message stability by shortest processing time scheduling. IEEE Trans Serv Comput 7(4):575–585
Chtepen M, Claeys FHA, Dhoedt B, De Turck F, Demeester P, Vanrolleghem PA (2009) Adaptive task checkpointing and replication: toward efficient fault-tolerant grids. IEEE Trans Parallel Distrib Syst 20(2):180–190
Zhao L, Ren Y, Xiang Y, Sakurai K (2010) Fault-tolerant scheduling with dynamic number of replicas in heterogeneous systems. In: 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC), p 434–441
Zhao L, Ren Y, Sakurai K (2013) Reliable workflow scheduling with less resource redundancy. Parallel Comput 39(10):567–585
Tămaş-Selicean D, Pop P (2015) Design optimization of mixed-criticality real-time embedded systems. Acm Trans Embed Comput Syst 14(3):1–29
Zheng Z, T. C Zhou, Lyu M R, King I (2012) Component ranking for fault-tolerant cloud applications. IEEE Trans Serv Comput 5(4):540–550
Qiu W, Zheng Z, Wang X, Yang X, Lyu MR (2014) Reliability-based design optimization for cloud migration. IEEE Trans Serv Comput 7(2):223–236
Available http://www.iec.ch/functionalsafety/
Available http://www.iso.org/iso/iso9000
Girault A, Kalla H (2009) A novel bicriteria scheduling heuristics providing a guaranteed global system failure rate. IEEE Trans Dependable and Secure Comput 6(4):241–254
Benoit A, Hakem M, Robert Y (2008) Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, p 1–8
Benoit A, Hakem M, Robert Y (Sept 2009) Optimizing the latency of streaming applications under throughput and reliability constraints. In: 2009 International Conference on Parallel Processing, p 325–332
Xie G, Liu L, Yang L, Li R (2017) Scheduling trade-off of dynamic multiple parallel workflows on heterogeneous distributed computing systems. Concurr Comput Pract Exp 29(2):e3782
Broberg J, Venugopal S, Buyya R (2008) Market-oriented grids and utility computing: the state-of-the-art and future directions. J Grid Comput 6(3):255–276
Available https://en.wikipedia.org/wiki/servicelevelagreement
Bridi T, Bartolini A, Lombardi M, Milano M, Benini L (2016) A constraint programming scheduler for heterogeneous high-performance computing machines. IEEE Trans Parallel Distrib Syst 27(10):2781–2794
Chiang S, Vasupongayya S (2008) Design and potential performance of goal-oriented job scheduling policies for parallel computer workloads. IEEE Trans Parallel Distrib Syst 19(12):1642–1656
Gu Z, Han G, Zeng H, Zhao Q (2016) Security-aware mapping and scheduling with hardware co-processors for flexray-based distributed embedded systems. IEEE Trans Parallel Distrib Syst 27(10):3044–3057
Xie G, Chen Y, Li R, Li K (2018) Hardware cost design optimization for functional safety-critical parallel applications on heterogeneous distributed embedded systems. IEEE Trans Ind Inform 14(6):2418–2431
Xie G, Chen Y, Liu Y, Wei Y, Li R, Li K (2017) Resource consumption cost minimization of reliable parallel applications on heterogeneous embedded systems. IEEE Trans Ind Inform 13(4):1629–1640
Tang X, Li K, Li R, Veeravalli B (2010) Reliability-aware scheduling strategy for heterogeneous distributed computing systems. J Parallel Distrib Comput 70(9):941–952
Tang X, Li K, Qiu M, Sha HM (2012) A hierarchical reliability-driven scheduling algorithm in grid systems. J Parallel Distrib Comput 72(4):525–535
Mei J, Li K, Zhou X, Li K (2015) Fault-tolerant dynamic rescheduling for heterogeneous computing systems. J Grid Comput 13(4):507–525
Qin X, Jiang H, Swanson D. R (2002) An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems. In: Proceedings International Conference on Parallel Processing, p 360–368
Qin X, Jiang H (2006) A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems. Parallel Comput 32(5–6):331–356
Zheng Q, Veeravalli B, Tham C (2009) On the design of fault-tolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. IEEE Trans Comput 58(3):380–393
Nahir A, Orda A, Raz D (2016) Replication-based load balancing. IEEE Trans Parallel Distrib Syst 27(2):494–507
Qiu Z, Pérez JF (2016) Evaluating replication for parallel jobs: an efficient approach. IEEE Trans Parallel Distrib Syst 27(8):2288–2302
Soniya J, Sujana J. A. J, Revathi T (2016) Dynamic fault tolerant scheduling mechanism for real time tasks in cloud computing. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), p 124–129
Wei M, Liu J, Li T, Xu X, Hu W, Zhao D (2017) Fault-tolerant scheduling of real-time tasks on heterogeneous systems. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), p 1006–1011
Tang Z, Qi L, Cheng Z, Li K, Khan SU, Li K (2016) An energy-efficient task scheduling algorithm in dvfs-enabled cloud environment. J Grid Comput 14(1):55–74
Xie G, Li R, Li K (2015) Heterogeneity-driven end-to-end synchronized scheduling for precedence constrained tasks and messages on networked embedded systems. Academic Press, Inc., Cambridge
Shatz SM, Wang J (1989) Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems. IEEE Trans Reliab 38(1):16–27
Verma A, Bhardwaj N (2016) A review on routing information protocol (rip) and open shortest path first (ospf) routing protocol. Int J Future Gener Commun Netw 9(4):161–170
Zheng Q, Veeravalli B (2009) On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices. J Parallel Distrib Comput 69(3):282–294
Acknowledgements
This work was supported in part by the Natural Science Foundation of Hunan Province, China, under Grant 2020JJ6063 and Grant 2019JJ50592, in part by the National Key Research and Development Program of China under Grant 2018YFB1003702, in part by the National Natural Science Foundation of China under Grant 61902336 and Grant 61703157, in part by the Hunan Province Science and Technology Project Funds under Grant 2018TP1036, and in part by the CERNET Innovation Project under Grant NGII20160310.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhu, J., Wang, L., Xie, G. et al. A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems. J Supercomput 77, 3450–3483 (2021). https://doi.org/10.1007/s11227-020-03403-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03403-x