Abstract
Data centers are critical environments that provide support for a wide range of services and applications, and therefore, there is a demand in order to guarantee high availability and reliability required in these environments. This work proposes a strategy based on models, SLA contracts, maintenance policies and optimization techniques for assessing the cost and availability of electrical infrastructures hosted in data centers. The proposed optimization strategy is based on design of experiments (DoE) and uses the availability importance index in order to detect the equipment that most impacts the system’s availability and, thus, to be able to propose improvements. In addition, a hybrid modeling approach that considers the advantages of stochastic Petri nets and reliability block diagrams is adopted to assess availability. To illustrate the applicability of the proposed approach, two case studies were carried out where significant results were obtained. In the first study, where the performance of the proposed strategy was compared with the brute force algorithm, it was possible to obtain results close to the optimum ones in a fraction of the time. For example, brute force demanded more than 100 minutes to be evaluated, while the proposed strategy took only 6 seconds.
Similar content being viewed by others
References
Ajmone Marsan M, Balbo G, Conte G et al (1986) Performance models of multiprocessor systems. the MIT Press
Almeida ATD (2005) Modelagem multicritério para seleção de intervalos de manutenção preventiva baseada na teoria da utilidade multiatributo. Pesquisa Operacional 25(1):69–81
Alpendre M (2006) Service level agreement: Um conceito a saber usar. Universidade de Coimbra, Departamento de Engenharia Informtica
Arregoces M, Portolani M (2003) Data center fundamentals. Cisco Press
Avelar V (2003) Comparing availability of various rack power redundancy configurations. APC white paper 48:1–22
Balanici M, Pachnicke S (2018) Hybrid electro-optical intra-data center networks tailored for different traffic classes. IEEE/OSA J Opt Commun Net 10(11):889–901
Balbo G (2000) Introduction to stochastic petri nets. In: School organized by the European Educational Forum, pp. 84–155. Springer
Borkowski M, Hans P (2007) Reliability centered maintenance (rcm) handbook. Naval Sea Systems Command, US
Bosse S, Jamous N, Kramer F, Turowski K (2016) Introducing greenhouse emissions in cost optimization of fault-tolerant data center design. In: 2016 IEEE 18th Conference on Business Informatics (CBI), vol. 1, pp. 163–172. IEEE
Briš R (2013) Evaluation of the production availability of an offshore installation by stochastic petri nets modeling. In: The International Conference on Digital Technologies 2013, pp. 147–155. IEEE
Callou G, Sousa E, Maciel P, et al (2010) Impact analysis of maintenance policies on data center power infrastructure. In: 2010 IEEE international conference on systems, man and cybernetics, pp. 526–533
Chandel S, Ni TY, Yang G (2018) Enterprise cloud: Its growth & security challenges in china. In: 2018 5th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2018 4th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), pp. 144–152. IEEE
CHUNG CA (2003) Simulation modeling handbook: a practical approach. In: eight edition, pp. 192–pp
Dhillon BS (2002) Engineering maintenance: a modern approach. cRc Press
Dumitrescu C, Plesca A, Adam M, Nituca C, Dragomir A (2018) Methods for reducing energy consumption, optimization in operational data centers. In: 2018 International Conference and Exposition on Electrical And Power Engineering (EPE), pp. 0483–0486. IEEE
Jair Cavalcante de Figueirêdo J (2011) Análise de dependabilidade de sistemas data center baseada em índices de importância. Master’s thesis, Universidade Federal de Pernambuco
Fox A, Griffith R, Joseph A, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I (2009) Above the clouds: a berkeley view of cloud computing. Dept Electrical Eng and Comput Sci. University of California, Berkeley, Rep. UCB/EECS 28(13), 2009
Gu Y, Jin D (2006) Drop test simulation and doe analysis for design optimization of microelectronics packages. In: 56th Electronic Components and Technology Conference 2006, pp. 6–pp. IEEE
Guimarães LM, Nogueira CF, da Silva MDB (2012) Manutenção industrial: implementação da manutenção produtiva total (tpm). e-xacta 5(1)
Hiles A (2004) Service-level agreements in business continuity management
Huang Y, Li G, Wang P, Chang F, Li J (2018) Electricy cost optimization of data center interactive services with ups. In: 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 181–184. IEEE
Jain R (1990) The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling. John
Jalby W, Wong DC, Kuck DJ, Acquaviva JT, Beyler JC (2012) Measuring computer performance. In: High-Performance Scientific Computing, pp. 75–95. Springer
Melo F (2019) bitbucket. https://bitbucket.org/felipe_lmelo/projconclusao
Meng X, Zhou J, Zhang X, Luo Z, Gong H, Gan T (2020) Optimization of the thermal environment of a small-scale data center in china. Energy 196:117080
Oliveira D, Matos R, Dantas J, Ferreira J, Silva B, Callou G, Maciel P, Brinkmann A (2017)Advanced stochastic petri net modeling with the mercury scripting language. In: Proceedings of the 11th EAI International Conference on Performance Evaluation Methodologies and Tools, pp. 192–197
Otani M, Machado WV (2008) A proposta de desenvolvimento de gestão da manutenção industrial na busca da excelência ou classe mundial. Revista Gestão Industrial 4(2): 1
Maciel PR, Trivedi KS, Matias R, Kim DS (2011) Performance and dependability in service computing: concepts, techniques and research directions, chapter dependability modeling. Premier Reference Source, IGI Global
Said U, Taghipour S (2016) Modeling failure and maintenance effects of a system subject to multiple preventive maintenance types. In: Reliability and Maintainability Symposium (RAMS), 2016 Annual, pp. 1–7. IEEE
da Silva AN, Lins FA, Júnior JC, Rosa NS, Quental NC, Maciel PR (2006) Avaliaçao de desempenho da composiçao de web services usando redes de petri. In: Brazilian Symposium on Computer Networks. Curitiba, Paraná, Brazil
Silva B, Matos R, Callou G, Figueiredo J, Oliveira D, Ferreira J, Dantas J, Junior A, Alves V., Maciel P (2015) Mercury: an integrated environment for performance and dependability evaluation of general systems. In: Proceedings of Industrial Track at 45th Dependable Systems and Networks Conference, DSN
Sousa EdC (2017) Os benefícios do investimento na capacitação profissional da equipe de manutenção e operação de data center. Datacenter: projeto, operação e serviços-Unisul Virtual
Sturm R, Morris W, Jander M (2000) Foundations of service level management, vol 13. Sams Indianapolis, IN
Torell W, Avelar V (2004) Tempo médio entre falhas: Explicação e padrões. W. American Power Conversion, Kingston, Rhode Island
Verdi FL, Rothenberg CE, Pasquini R, Magalhães M (2010) Novas arquiteturas de data center para cloud computing. Minicursos do XXVIII SBRC pp. 103–152
Viana HRG (2002) PCM-Planejamento e Controle da Manutenção. Qualitymark Editora Ltda
Wang R, Cheng Z, Rong L, Bai Y, Wang Q (2021) Availability optimization of two-dimensional warranty products under imperfect preventive maintenance. IEEE Access 9:8099–8109
Wang W, Xu Y, Fan B, Xiong J (2017) On multi-state system with interval-valued states under preventive maintenance and minimal repairs. In: Reliability Systems Engineering (ICRSE), 2017 Second International Conference on, pp. 1–7. IEEE
Wang X, Zhou H, Parlikad AK, Xie M (2020) Imperfect preventive maintenance policies with unpunctual execution. IEEE Trans Reliab 69(4):1480–1492
WERNKE R (2016) Gestão de custos: uma abordagem prática. são paulo: Atlas, 2001. \_. Gestão de custos: uma abordagem prática 2
Yeganeh H, Salahi A, Pourmina MA (2019) A novel cost optimization method for mobile cloud computing by capacity planning of green data center with dynamic pricing. Canadian J Elect Comput Eng 42(1):41–51
Zimmermann A, Knoke M, Huck A, Hommel G (2006) Towards version 4.0 of timenet. In: 13th GI/ITG Conference-Measuring, Modelling and Evaluation of Computer and Communication Systems, pp. 1–4. VDE
Acknowledgements
The authors would like to thank FACEPE, CNPq, and CAPES for their support of this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Melo, F., Andrade, E. & Callou, G. Optimization of electrical infrastructures at data centers through a DoE-based approach. J Supercomput 78, 406–439 (2022). https://doi.org/10.1007/s11227-021-03874-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03874-6