Elsevier

Neurocomputing

Volume 256, 20 September 2017, Pages 90-100
Neurocomputing

Fault-tolerant system design on cloud logistics by greener standbys deployment with Petri net model

https://doi.org/10.1016/j.neucom.2016.08.134Get rights and content

Abstract

The cost-aware exploration on enhancing fault-tolerant becomes an important issue of service quality from cloud platform. To approach this goal with greener design, a novel server backup strategy is adopted with two types of standby server with warm standby and cold standby configurations. On such two-level standby scheme, cost elaboration has been explored in terms of deployment ratio between warm standbys and cold standbys. The cold standbys provide a greener power solution than those of conventional warm standbys. The optimal cost policy has been proposed to maintain regulated quality of service for the cloud customers. On qualitative study, a Petri net is developed and designed to visualize the whole system operational flow. On quantitative research for decision support, the theory of finite source queue is elaborated and relevant comprehensive mathematical analysis on cost pattern has been made in detail. Relevant simulations have been conducted to validate the proposed cost optimization model as well. On green contribution, the saving of power consumption has been estimated on the basis of switching warm standbys into cold standbys, which amounts for the reduction of CO2 emission. Hence the proposed approach indeed provides a feasibly standby architecture to meet cloud logistic economy with greener deployment.

Introduction

The cloud environment has gained the popularity to be the mainstream platform of transforming a large part of the IT industry, making software more attractive as a service and shaping the way IT hardware is designed and purchased [1], [2]. With the ever-increasing market requirement of the cloud platforms, the system design on resource management attracts more attention from both industry and academia. The field of high-reliability, high-availability, fault-tolerant computing was developed for the critical needs of military and space applications. Fault-tolerant computing is a generic term describing redundant design techniques with duplicate components enabling uninterrupted service in response to component failure [3]. To counter the influence of faulty components, it is absolutely required to alleviate unavoidable impact from server breakdowns during service. Hence, the cloud logistics needs an optimization design or cost model on the spare profile for the long-term management need of the cloud platform [4], [5], [6].

The standby concept is the basic scheme to maintain operation with regulated quality of service (QoS) even when some components fail. The proposed system architecture consists of three modules: operating module, spares module, and repair facility. The numbers of servers in the operating module are deployed according to the QoS requirements. Whenever one of operating servers fails, it is immediately replaced by a standby server in the spares module and the failed server is delivered to the repair facility as well. The design goal for this work is to explore the issue: On the cost-based logistics, how many standby servers in the spares module would be optimal if a certain level of the server availability is kept? To explore the tradeoff study on them, the proposed optimization technique may provide the cloud expert with decision support on the number of spare servers including warm standbys and cold standbys [7].

On the balance of profit and cost, the logistic expense would drive the profit to the negative side although it can make the profit income more stably by maintaining the QoS in good condition. Even more, any cloud provider with excellent reputation of service quality may attract more potential cloud users for profit. To explore the tradeoff study on them, the proposed optimization technique may provide the cloud expert with deployment scheme on the number of spare servers in case of server failure for profit optimization. This framework on cloud logistic system is based on the theory of finite source queue (FSQ) model [8], [9] with the two-level standbys scheme. At some planning period, there exists a finite quantity of operating (online) servers to provide cloud service under contract-based commitment for cloud customers. On mapping profile, such finite quantity of operating servers can be modeled as the term: “finite source” in the FSQ model of queuing theory. To the best of our knowledge, this appears to be a novel originality that such a finite-source model with two-level standbys scheme may be applied in the logistic strategy for cost optimization in the cloud environment.

The design goal for this work is to explore the issue: To approach greener cloud logistics, the proposed two-level standbys are deployed with warm and cold standbys instead of only warm standbys deployment. How many cold-standby servers would be optimal if some amount of warm standbys is fixed or regulated and a certain level of the server availability is kept? To explore the tradeoff study on them, the proposed cost optimization technique may provide the cloud expert with decision support on the number of spare servers. The key contributions of this paper are threefold: (1) this work provides cloud administrator with a feasible greener logistic framework to optimize the cost improvement. On management aspect, the proposed system can be adopted to be a decision-making methodology approaching predictive management other than reactive or chaotic management. (2) On qualitative study, a Petri net model is developed and designed to visualize the whole system operational flow. On quantitative perspective, theoretical expressions have been newly derived and relevant system metrics has been established in a brand-new manner. (3) On verification aspect, relevant experimental results are conducted and obtained to approach the feasibility of cost optimization. The simulated results indicate that the proposed approach may provide a feasible decision support for deployment on quantities of standby servers. To perceive the green effect from cold-standby server deployment, an exemplified estimation from kilowatt-hour electricity into CO2 emission has been conducted to shed light on the proposed scheme.

The rest of the paper is organized as follows: Section 2 describes related work and the motivation behind this research. From qualitative perspective, a detailed systematic diagram using Petri nets model is provided for visualizing the operational flow on the proposed two-level logistic framework in Section 3. On quantitative work in Section 4, the mathematical analysis has been conducted in detail and also relevant system performance measures like the expected number of operating servers, the expected number of spares, etc. have been derived. Following this, in Section 5, the two-level logistic system is further addressed in terms of cost function, in which simulations are conducted as well for the feasibility of the proposed scheme. The reduction of CO2 due to the power saving by switching warm standbys to cold standbys has been addressed as well. Finally, some concluding remarks are made in Section 6.

Section snippets

Related work

Backup servers are rudimental and effective for implementing fault-tolerant requirement of cloud services against hardware failures and disasters. Hu et al. [10] proposed a backup sever sharing scheme in the Inter-cloud to reduce the cost of backup servers. Their numerical modeling was solely based upon the availability of computation without any of solid queuing materials. On node failure in CC network, an algorithm was proposed to estimate the network performance under maintenance budget with

Two-level standbys in cloud logistics system

The proposed framework of two-level standbys in cloud logistics system (TS-CLS) can be modeled as an M/M/1 finite source queue (FSQ) with two-level standbys which is also termed as machine repair problem [8,9]. The jargon “finite source” in FSQ theory represents the finite quantity of operating servers provided to guarantee the QoS by cloud providers. On this FSQ model, a cluster of T = OR + CS + WS identical servers (operating and standbys) are maintained by a repairman in the repair facility. There

Mathematics and steady-state solutions

The proposed framework is configured with T = OR + CS + WS identical servers and one repairman in the cloud logistical system. As many as OR of these servers are regulated to provide service requirements under contract-based commitment for cloud users on the scene, the rest of the CS servers and WS servers are regarded as standby servers. To approach analytic steady-state results for the proposed model, we first construct the state-transition-rate diagram depicted in Fig. 5. There are three chains of

Cost analysis

On such two-level standby scheme, cost patterns are elaborated to explore deployment ratio between warm spares and cold spares from all standby servers. On the management aspect, the cost-aware issue deserves much consideration to approach long-term running steadily for the cloud business. It is assumed that the proposed TS-CLS is regulated to maintain OR servers at least for maintaining guaranteed quality of service. Hence, the cost per unit time of each server downtime in operating FM is

Conclusion

Fault-tolerant system design of cloud platform is an important issue of cloud computing which is of crucial interest for long-term running. The inevitable breakdown of servers in operation would definitely compromise the loyalty of brand for cloud customers due to the deterioration of cloud QoS if the fault-tolerant system is not well designed and implemented. To this aim, an optimization framework on cost profile has been proposed using finite-source queuing theory with two-level standbys.

Dr. Fuu-Cheng Jiang worked as a design engineer with the Aeronautical Research Lab., Chung Shan Institute of Science and Technology (CSIST) when he was assigned to a partnership project at General Dynamic, Fort Worth, Texas. Currently, he is a member of faculty in the Department of Computer Science at Tunghai University in Taiwan. Dr. Jiang was the recipient of the Best Paper Award at the 5th International Conference on Future Information Technology 2010 (FutureTech 2010), which ranked his

References (24)

  • B. Hu et al.

    Cost reduction evaluation of sharing backup servers in inter-cloud

  • L. Chen et al.

    A system architecture for intelligent logistics system

  • Cited by (0)

    Dr. Fuu-Cheng Jiang worked as a design engineer with the Aeronautical Research Lab., Chung Shan Institute of Science and Technology (CSIST) when he was assigned to a partnership project at General Dynamic, Fort Worth, Texas. Currently, he is a member of faculty in the Department of Computer Science at Tunghai University in Taiwan. Dr. Jiang was the recipient of the Best Paper Award at the 5th International Conference on Future Information Technology 2010 (FutureTech 2010), which ranked his paper first among the 201 submittals. He has served dozens of the TPC for worldwide international conferences like BWCCA 2010, ICCCT 2011–2012, IEEE CloudCom 2012, IEEE BIOCAS-2013, NPC 2014, SC2 2014, IEEE SPICES 2015, SC2 2015, IoT 2015, CCBD 2015 and also the Session Chair of CSE2011 and IEEE ICCE-Taiwan 2014, publication Chair of NPC 2014. Moreover, he served as journal reviewer of the Computer Journal, Ad Hoc Networks, Journal of Network and Computer Applications (JNCA), Journal of Supercomputing (JOS), Journal of Internet Technology (JIT), International Journal of Communication Systems (IJCS), and IEEE Transactions on Cloud Computing. Dr. Jiang has served as a member of an Editorial Board Member on CIP-JWCMCN Journal and an assistant editor on the Cloud-Link Editorial Team of IEEE society. His research interests include network modeling, services on cloud computing, wireless sensor networks and simulation. Dr. Jiang is a member of IEEE society.

    Ching-Hsien (Robert) Hsu is a professor in Department of Computer Science and Information Engineering at Chung Hua University, Taiwan; and distinguished chair professor in school of computer and communication engineering at Tianjin University of Technology, China. His research includes high performance computing, cloud computing, parallel and distributed systems, big data analytics, ubiquitous/pervasive computing and intelligence. He has published 200 papers in refereed journals, conference proceedings and book chapters in these areas. Dr. Hsu is the editor-in-chief of International Journal of Grid and High Performance Computing, and International Journal of Big Data Intelligence; and serving as editorial board for a number of prestigious journals, including IEEE Transactions on Service Computing, IEEE Transactions on Cloud Computing, etc. He has been acting as an author/co-author or an editor/co-editor of 10 books from Springer, IGI Global, World Scientific and McGraw-Hill. He has also edited a number of special issues at top journals, such as IEEE Transactions on Cloud Computing, IEEE Transactions on Services Computing, IEEE System Journal, Future Generation Computer Systems, Journal of Supercomputing, etc. Prof. Hsu was awarded eight times distinguished award for excellence in research and annual outstanding research award through 2005–2015 from Chung Hua University. He has been serving as Executive Committee of Taiwan Association of Cloud Computing (TACC) from 2008 to 2012; Executive Committee of the IEEE Technical Committee of Scalable Computing (2008–2012); IEEE Cloud Computing (2012-present); Dr. Hsu is an IEEE senior member.

    View full text