Abstract
A notable requirement of heterogeneous parallel and distributed computing systems is to maximize their processing performance and agreed upon QoS. Lots of work in this field has been done to optimize the system performance by improving certain metrics such as reliability, robustness, security, and so on. However, most of them assume that systems are running without interruption all the time and seldom consider the system’s intrinsic characteristics, such as failure rate, repair rate, and lifetime. In this paper, we study how to achieve high availability based on residual lifetime analysis for heterogeneous distributed computational systems with considering their essential features. First, we provide an availability model taking into account system’s expected residual lifetime. Second, we propose an objective function about the model and develop a heuristic scheduling algorithm to maximize the availability with the makespan constraint. At last, we demonstrate these advantages through the extensive simulated experiments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Alhamdan AA (2003) Scheduling methods for efficient utilization of clusters computing environments. PhD thesis, University of Connecticut
Joseph J, Fellenstein C (2003) Grid computing. IBM Press
Sarangapani J (2007) Wireless Ad hoc and sensor networks: protocols, performance, and control. In: CRC, April, 2007
Braun TD, Siegel HJ, Beck N, Bölöni LL, Maheswaran M, Reuther AI, Robertson JP, Theys MD, Yao B, Hensgen D, Freund RF (2001) A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J Parallel Distrib Comput 61(6):810–837
Mehta AM, Smith J, Siegel HJ, Maciejewski AA, Jayaseelan A, Ye B (2007) Dynamic resource allocation heuristics that manage tradeoff between makespan and robustness. J Supercomput 42(1):33–58
Schmidt G (1998) Scheduling with limited machine availability. International Computer Science Institute Technical Report (TR-98-036)
Sinnen O, Sandnes FE (2006) Toward a realistic task scheduling model. IEEE Trans Parallel Distrib Syst 17(3):263–275. Senior Member-Leonel Augusto Sousa
Ali S, Maciejewski AA, Siegel HJ, Kim JK (2004) Measuring the robustness of a resource allocation. IEEE Trans Parallel Distrib Syst 15(7):630–641
Aven T, Jensen U (1999) Stochastic models in reliability. Stochastic modelling and applied probability. Springer, Berlin
Kallenberg O (2001) Foundations of modern probability. Springer, Berlin
Papadimitriou CH (1993) Computational complexity. Addison–Wesley, Reading
Hochbaum D (1996) Approximation algorithms for NP-hard problems. Course Technology
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison–Wesley, Reading
Chu W, Holloway L, Lan MT, Efe K (1980) Task allocation in distributed data processing. IEEE Mag Comput 13(11):57–69
Shatz SM, Wang JP, Goto M (1992) Task allocation for maximizing reliability of distributed computer systems. IEEE Trans Comput 41(9):1156–1168
Xie T, Qin X (2006) Stochastic scheduling with availability constraints in heterogeneous clusters. IEEE Proc Clust Comput 9:1–10
Weng C, Li M, Lu X (2006) An online scheduling algorithm for assigning jobs in the computational grid. IEICE Trans Inf Syst E89-D(2):597–604
Castillo C, Rouskas GN, Harfoush K (2007) On the design of online scheduling algorithms for advance reservations and QoS in Grids. In: IPDPS, pp 1–10
Shestak V, Smith J, Siegel HJ, Maciejewski AA (2006) A stochastic approach to measuring the robustness of resource allocations in distributed systems. In: ICPP _06: Proceedings of the 2006 international conference on parallel processing, Washington, DC, USA. IEEE Computer Society, pp 459–470
Dogan A, Özgüner F (2000) Reliable matching and scheduling of precedenceconstrained tasks in heterogeneous distributed computing. In: ICPP _00: Proceedings of the 2000 international conference on parallel processing, Washington, DC, USA. IEEE Computer Society, p 307
Topcuouglu H, Hariri S, Wu MY (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Hariri S, Raghavendra CS (1986) Distributed functions allocation for reliability and delay optimization. In: ACM _86: Proceedings of 1986 ACM fall joint computer conference, Los Alamitos, CA, USA. IEEE Computer Society Press, pp 344–352
Srinivasan S, Jha NK (1999) Safety and reliability driven task allocation in distributed systems. IEEE Trans Parallel Distrib Syst 10(3):238–251
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, C., Jiang, X., Yin, H. et al. Optimizing availability and QoS of heterogeneous distributed system based on residual lifetime in uncertain environment. J Supercomput 48, 243–263 (2009). https://doi.org/10.1007/s11227-008-0217-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-008-0217-x