A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems

Zhu, Jiang; Wang, Lizan; Xie, Guoqi; Pei, Tingrui; Oh, Sangyoon; Li, Zhetao

doi:10.1007/s11227-020-03403-x

A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems

Published: 20 August 2020

Volume 77, pages 3450–3483, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jiang Zhu ORCID: orcid.org/0000-0003-3570-7594^1,2,3,
Lizan Wang^1,2,3,
Guoqi Xie⁴,
Tingrui Pei^1,2,3,
Sangyoon Oh⁵ &
…
Zhetao Li^1,2,3

532 Accesses
2 Citations
Explore all metrics

Abstract

With a large number of heterogeneous processors are deployed on service-oriented cloud computing systems, the issue of processor random hardware failure is becoming increasingly prominent. Replication-based fault-tolerance task assignment is a common approach to satisfy application’s reliability requirement. However, the state-of-the-art algorithms have either high redundancy or low time efficiency. In this work, we propose a fast task assignment for minimizing redundancy (FTAMR) algorithm to satisfy reliability requirement for a directed acyclic graph-based parallel application on heterogeneous service-oriented cloud computing systems. Firstly, the FTAMR algorithm fast identifies tasks which need to be replicated. Secondly, the FTAMR algorithm fast maps selected tasks to their respective most suitable processors. Then, the FTAMR algorithm repeats above steps until application’s reliability satisfies established reliability requirement. Experimental results on real and synthetic generated parallel applications at different scales, parallelism, and heterogeneity show that the FTAMR algorithm can generate minimum redundancy and maximum time efficiency compared with the state-of-the-art fault-tolerance algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fault-tolerant scheduling algorithm that minimizes the number of replicas in heterogeneous service-oriented cloud computing systems

Article 27 February 2024

Efficiency-First Fault-Tolerant Replica Scheduling Strategy for Reliability Constrained Cloud Application

A Reliability-aware Task Scheduling Algorithm Based on Replication on Heterogeneous Computing Systems

Article 30 November 2016

References

Cai Z, Li X, Gupta JND (2016) Heuristics for provisioning services to workflows in xaas clouds. IEEE Trans Serv Comput 9(2):250–263
Article Google Scholar
Zhou A, Wang S, Cheng B, Zheng Z, Yang F, Chang RN, Lyu MR, Buyya R (2017) Cloud service reliability enhancement via virtual machine placement optimization. IEEE Trans Serv Comput 10(6):902–913
Article Google Scholar
Fu Z, Huang F, Sun X, Vasilakos AV, Yang C (2019) Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans Serv Comput 12(5):813–823
Article Google Scholar
Xia Z, Wang X, Sun X, Wang Q (2016) A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans Parallel Distrib Syst 27(2):340–352
Article Google Scholar
Kong Y, Zhang M, Ye D (2017) A belief propagation-based method for task allocation in open and dynamic cloud environments. Knowl Based Syst 115:123–132
Article Google Scholar
Xie G, Zeng G, Chen Y, Bai Y, Zhou Z, Li R, Li K (2018) Minimizing redundancy to satisfy reliability requirement for a parallel application on heterogeneous service-oriented systems. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2017.2665552
Article Google Scholar
Tang Z, Qi L, Cheng Z, Li K, Khan SU, Li K (2016) An energy-efficient task scheduling algorithm in dvfs-enabled cloud environment. J Grid Comput 14(1):55–74
Article Google Scholar
Topcuoglu H, Hariri S, Wu MY (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Article Google Scholar
Khan MA (2012) Scheduling for heterogeneous systems using constrained critical paths. Parallel Comput 38(4–5):175–193
Article Google Scholar
Xie G, Li R, Li K (2015) Heterogeneity-driven end-to-end synchronized scheduling for precedence constrained tasks and messages on networked embedded systems. J Parallel Distrib Comput 83:1–12
Article Google Scholar
Bosilca G, Bouteiller A, Danalis A, Herault T, Lemarinier P, Dongarra J (2012) Dague: a generic distributed dag engine for high performance computing. Parallel Comput 38(1):37–51 (Extensions for next-generation parallel programming models)
Article Google Scholar
Leu J, Chen C, Hsu K (2014) Improving heterogeneous soa-based iot message stability by shortest processing time scheduling. IEEE Trans Serv Comput 7(4):575–585
Article Google Scholar
Chtepen M, Claeys FHA, Dhoedt B, De Turck F, Demeester P, Vanrolleghem PA (2009) Adaptive task checkpointing and replication: toward efficient fault-tolerant grids. IEEE Trans Parallel Distrib Syst 20(2):180–190
Article Google Scholar
Zhao L, Ren Y, Xiang Y, Sakurai K (2010) Fault-tolerant scheduling with dynamic number of replicas in heterogeneous systems. In: 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC), p 434–441
Zhao L, Ren Y, Sakurai K (2013) Reliable workflow scheduling with less resource redundancy. Parallel Comput 39(10):567–585
Article MathSciNet Google Scholar
Tămaş-Selicean D, Pop P (2015) Design optimization of mixed-criticality real-time embedded systems. Acm Trans Embed Comput Syst 14(3):1–29
Article Google Scholar
Zheng Z, T. C Zhou, Lyu M R, King I (2012) Component ranking for fault-tolerant cloud applications. IEEE Trans Serv Comput 5(4):540–550
Article Google Scholar
Qiu W, Zheng Z, Wang X, Yang X, Lyu MR (2014) Reliability-based design optimization for cloud migration. IEEE Trans Serv Comput 7(2):223–236
Article Google Scholar
Available http://www.iec.ch/functionalsafety/
Available http://www.iso.org/iso/iso9000
Girault A, Kalla H (2009) A novel bicriteria scheduling heuristics providing a guaranteed global system failure rate. IEEE Trans Dependable and Secure Comput 6(4):241–254
Article Google Scholar
Benoit A, Hakem M, Robert Y (2008) Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, p 1–8
Benoit A, Hakem M, Robert Y (Sept 2009) Optimizing the latency of streaming applications under throughput and reliability constraints. In: 2009 International Conference on Parallel Processing, p 325–332
Xie G, Liu L, Yang L, Li R (2017) Scheduling trade-off of dynamic multiple parallel workflows on heterogeneous distributed computing systems. Concurr Comput Pract Exp 29(2):e3782
Article Google Scholar
Broberg J, Venugopal S, Buyya R (2008) Market-oriented grids and utility computing: the state-of-the-art and future directions. J Grid Comput 6(3):255–276
Article Google Scholar
Available https://en.wikipedia.org/wiki/servicelevelagreement
Bridi T, Bartolini A, Lombardi M, Milano M, Benini L (2016) A constraint programming scheduler for heterogeneous high-performance computing machines. IEEE Trans Parallel Distrib Syst 27(10):2781–2794
Article Google Scholar
Chiang S, Vasupongayya S (2008) Design and potential performance of goal-oriented job scheduling policies for parallel computer workloads. IEEE Trans Parallel Distrib Syst 19(12):1642–1656
Article Google Scholar
Gu Z, Han G, Zeng H, Zhao Q (2016) Security-aware mapping and scheduling with hardware co-processors for flexray-based distributed embedded systems. IEEE Trans Parallel Distrib Syst 27(10):3044–3057
Article Google Scholar
Xie G, Chen Y, Li R, Li K (2018) Hardware cost design optimization for functional safety-critical parallel applications on heterogeneous distributed embedded systems. IEEE Trans Ind Inform 14(6):2418–2431
Article Google Scholar
Xie G, Chen Y, Liu Y, Wei Y, Li R, Li K (2017) Resource consumption cost minimization of reliable parallel applications on heterogeneous embedded systems. IEEE Trans Ind Inform 13(4):1629–1640
Article Google Scholar
Tang X, Li K, Li R, Veeravalli B (2010) Reliability-aware scheduling strategy for heterogeneous distributed computing systems. J Parallel Distrib Comput 70(9):941–952
Article Google Scholar
Tang X, Li K, Qiu M, Sha HM (2012) A hierarchical reliability-driven scheduling algorithm in grid systems. J Parallel Distrib Comput 72(4):525–535
Article Google Scholar
Mei J, Li K, Zhou X, Li K (2015) Fault-tolerant dynamic rescheduling for heterogeneous computing systems. J Grid Comput 13(4):507–525
Article Google Scholar
Qin X, Jiang H, Swanson D. R (2002) An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems. In: Proceedings International Conference on Parallel Processing, p 360–368
Qin X, Jiang H (2006) A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems. Parallel Comput 32(5–6):331–356
Article MathSciNet Google Scholar
Zheng Q, Veeravalli B, Tham C (2009) On the design of fault-tolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. IEEE Trans Comput 58(3):380–393
Article MathSciNet Google Scholar
Nahir A, Orda A, Raz D (2016) Replication-based load balancing. IEEE Trans Parallel Distrib Syst 27(2):494–507
Article Google Scholar
Qiu Z, Pérez JF (2016) Evaluating replication for parallel jobs: an efficient approach. IEEE Trans Parallel Distrib Syst 27(8):2288–2302
Article Google Scholar
Soniya J, Sujana J. A. J, Revathi T (2016) Dynamic fault tolerant scheduling mechanism for real time tasks in cloud computing. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), p 124–129
Wei M, Liu J, Li T, Xu X, Hu W, Zhao D (2017) Fault-tolerant scheduling of real-time tasks on heterogeneous systems. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), p 1006–1011
Tang Z, Qi L, Cheng Z, Li K, Khan SU, Li K (2016) An energy-efficient task scheduling algorithm in dvfs-enabled cloud environment. J Grid Comput 14(1):55–74
Article Google Scholar
Xie G, Li R, Li K (2015) Heterogeneity-driven end-to-end synchronized scheduling for precedence constrained tasks and messages on networked embedded systems. Academic Press, Inc., Cambridge
Book Google Scholar
Shatz SM, Wang J (1989) Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems. IEEE Trans Reliab 38(1):16–27
Article Google Scholar
Verma A, Bhardwaj N (2016) A review on routing information protocol (rip) and open shortest path first (ospf) routing protocol. Int J Future Gener Commun Netw 9(4):161–170
Article Google Scholar
Zheng Q, Veeravalli B (2009) On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices. J Parallel Distrib Comput 69(3):282–294
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of Hunan Province, China, under Grant 2020JJ6063 and Grant 2019JJ50592, in part by the National Key Research and Development Program of China under Grant 2018YFB1003702, in part by the National Natural Science Foundation of China under Grant 61902336 and Grant 61703157, in part by the Hunan Province Science and Technology Project Funds under Grant 2018TP1036, and in part by the CERNET Innovation Project under Grant NGII20160310.

Author information

Authors and Affiliations

The Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, China
Jiang Zhu, Lizan Wang, Tingrui Pei & Zhetao Li
The Key Laboratory of Intelligent Computing and Information Processing, Xiangtan University, Xiangtan, 411105, China
Jiang Zhu, Lizan Wang, Tingrui Pei & Zhetao Li
The School of Automation and Electronics Information, Xiangtan University, Xiangtan, 411105, China
Jiang Zhu, Lizan Wang, Tingrui Pei & Zhetao Li
Key Laboratory for Embedded and Network Computing of Hunan Province, The College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, 410082, China
Guoqi Xie
The Department of Computer and Information Engineering, Ajou University, Suwon, 443-749, South Korea
Sangyoon Oh

Authors

Jiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lizan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guoqi Xie
View author publications
You can also search for this author in PubMed Google Scholar
Tingrui Pei
View author publications
You can also search for this author in PubMed Google Scholar
Sangyoon Oh
View author publications
You can also search for this author in PubMed Google Scholar
Zhetao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tingrui Pei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, J., Wang, L., Xie, G. et al. A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems. J Supercomput 77, 3450–3483 (2021). https://doi.org/10.1007/s11227-020-03403-x

Download citation

Published: 20 August 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11227-020-03403-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems

Abstract

Access this article

Similar content being viewed by others

A fault-tolerant scheduling algorithm that minimizes the number of replicas in heterogeneous service-oriented cloud computing systems

Efficiency-First Fault-Tolerant Replica Scheduling Strategy for Reliability Constrained Cloud Application

A Reliability-aware Task Scheduling Algorithm Based on Replication on Heterogeneous Computing Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems

Abstract

Access this article

Similar content being viewed by others

A fault-tolerant scheduling algorithm that minimizes the number of replicas in heterogeneous service-oriented cloud computing systems

Efficiency-First Fault-Tolerant Replica Scheduling Strategy for Reliability Constrained Cloud Application

A Reliability-aware Task Scheduling Algorithm Based on Replication on Heterogeneous Computing Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation