Abstract
In addition to the traditional reactive fault-tolerant technologies, such as erasure codes and replication, proactive fault tolerance can be used to improve the system’s reliability significantly. To the best of our knowledge, however, there is no previous publications on the reliability of such a cloud storage system except for those on RAID systems. In this paper, several Markov-based models are respectively proposed to evaluate the reliability of the cloud storage systems with/without proactive fault tolerance from the system perspective. Since proactive measure should be coupled with some reactive measure to ensure the systems reliability, the reliability model for such a system will be very intricate. To facilitate model building, we propose the basic state transition unit (BSTU), to describe the general pattern of state transition in the proactive cloud storage systems. BSTU serves as the generic “brick” for building the overall reliability model for such a system. Using our models, we demonstrate the benefits that proactive fault tolerance has on a system’s reliability, and also estimate the impacts of some system parameters on it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)
Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: Proceedings of the 18th International Conference on Machine Learning, pp. 202–209 (2001)
Hughes, G.F., Murray, J.F., Kreutz-Delgado, K., Elkan, C.: Improved disk-drive failure warnings. IEEE Trans. Reliab. 51(3), 350–357 (2002)
Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of the International Conference on Artificial Neural Networks (2003)
Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Proceedings of the 10th Industrial Conference on Advances in Data Mining: Applications and Theoretical Aspects, pp. 390–404 (2010)
Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: Proceedings of 29th IEEE Conference on Massive Storage Systems and Technologies (MSST), pp. 1–5. Long Beach, CA (2013)
Li, J., Ji, X., Jia, Y., Zhu, B., Wang, G., Li, Z., Liu, X.: Hard drive failure prediction using classification and regression trees. In: Proceedings of 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 383–394 (2014)
Eckart, B., Chen, X., He, X., Scott, S.L.: Failure prediction models for proactive fault tolerance within storage systems. In: Proceedings of IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems, pp. 1–8 (2008)
Zhang, Z., Lian, Q.: Reperasure: replication protocol using erasure-code in peer-to-peer storage network. In: Proceedings of 21st IEEE Symposium on Reliable Distributed Systems, pp. 330–335 (2002)
Peric, D., Bocek, T., Hecht, F.V., Hausheer, D., Stiller, B.: The design and evaluation of a distributed reliable file system. In: Proceedings of International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 348–353 (2009)
Chen, M., Chen, W., Liu, L., Zhang, Z.: An analytical framework and its applications for studying brick storage reliability. In: Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems, pp. 242–252. IEEE (2007)
Leslie, M., Davies, J., Huffman, T.: A comparison of replication strategies for reliable decentralised storage. J. Netw. 1(6), 36–44 (2006)
Venkatesan, V., Iliadis, I.: A general reliability model for data storage systems. In: Proceedings of Ninth International Conference on Quantitative Evaluation of Systems Quantitative Evaluation of Systems (QEST), pp. 209–219 (2012)
Rao, K., Hafner, J.L., Golding, R.A.: Reliability for networked storage nodes. In: DSN 2006 International Conference on Dependable Systems and Networks, pp. 237–248 (2006)
Borges, C.L., Falcão, D.M., Mello, J.C.O., Melo, A.C.: Composite reliability evaluation by sequential monte carlo simulation on parallel and distributed processing environments. IEEE Trans. Power Syst. 16(2), 203–209 (2001)
Acknowledgments
This work is partially supported by NSF of China (61373018, 11301288, 11450110409), Program for New Century Excellent Talents in University (NCET130301), the Fundamental Research Funds for the Central Universities (65141021) and the Ph.D. Candidate Research Innovation Fund of Nankai University.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, J., Li, M., Wang, G., Liu, X., Li, Z., Tang, H. (2015). Global Reliability Evaluation for Cloud Storage Systems with Proactive Fault Tolerance. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9531. Springer, Cham. https://doi.org/10.1007/978-3-319-27140-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-27140-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27139-2
Online ISBN: 978-3-319-27140-8
eBook Packages: Computer ScienceComputer Science (R0)