Global Reliability Evaluation for Cloud Storage Systems with Proactive Fault Tolerance

Li, Jing; Li, Mingze; Wang, Gang; Liu, Xiaoguang; Li, Zhongwei; Tang, Huijun

doi:10.1007/978-3-319-27140-8_14

Jing Li¹⁷,
Mingze Li¹⁷,
Gang Wang¹⁷,
Xiaoguang Liu¹⁷,
Zhongwei Li¹⁷ &
…
Huijun Tang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9531))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1505 Accesses
5 Citations

Abstract

In addition to the traditional reactive fault-tolerant technologies, such as erasure codes and replication, proactive fault tolerance can be used to improve the system’s reliability significantly. To the best of our knowledge, however, there is no previous publications on the reliability of such a cloud storage system except for those on RAID systems. In this paper, several Markov-based models are respectively proposed to evaluate the reliability of the cloud storage systems with/without proactive fault tolerance from the system perspective. Since proactive measure should be coupled with some reactive measure to ensure the systems reliability, the reliability model for such a system will be very intricate. To facilitate model building, we propose the basic state transition unit (BSTU), to describe the general pattern of state transition in the proactive cloud storage systems. BSTU serves as the generic “brick” for building the overall reliability model for such a system. Using our models, we demonstrate the benefits that proactive fault tolerance has on a system’s reliability, and also estimate the impacts of some system parameters on it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)
MathSciNet MATH Google Scholar
Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: Proceedings of the 18th International Conference on Machine Learning, pp. 202–209 (2001)
Google Scholar
Hughes, G.F., Murray, J.F., Kreutz-Delgado, K., Elkan, C.: Improved disk-drive failure warnings. IEEE Trans. Reliab. 51(3), 350–357 (2002)
Article Google Scholar
Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of the International Conference on Artificial Neural Networks (2003)
Google Scholar
Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Proceedings of the 10th Industrial Conference on Advances in Data Mining: Applications and Theoretical Aspects, pp. 390–404 (2010)
Google Scholar
Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: Proceedings of 29th IEEE Conference on Massive Storage Systems and Technologies (MSST), pp. 1–5. Long Beach, CA (2013)
Google Scholar
Li, J., Ji, X., Jia, Y., Zhu, B., Wang, G., Li, Z., Liu, X.: Hard drive failure prediction using classification and regression trees. In: Proceedings of 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 383–394 (2014)
Google Scholar
Eckart, B., Chen, X., He, X., Scott, S.L.: Failure prediction models for proactive fault tolerance within storage systems. In: Proceedings of IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems, pp. 1–8 (2008)
Google Scholar
Zhang, Z., Lian, Q.: Reperasure: replication protocol using erasure-code in peer-to-peer storage network. In: Proceedings of 21st IEEE Symposium on Reliable Distributed Systems, pp. 330–335 (2002)
Google Scholar
Peric, D., Bocek, T., Hecht, F.V., Hausheer, D., Stiller, B.: The design and evaluation of a distributed reliable file system. In: Proceedings of International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 348–353 (2009)
Google Scholar
Chen, M., Chen, W., Liu, L., Zhang, Z.: An analytical framework and its applications for studying brick storage reliability. In: Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems, pp. 242–252. IEEE (2007)
Google Scholar
Leslie, M., Davies, J., Huffman, T.: A comparison of replication strategies for reliable decentralised storage. J. Netw. 1(6), 36–44 (2006)
Google Scholar
Venkatesan, V., Iliadis, I.: A general reliability model for data storage systems. In: Proceedings of Ninth International Conference on Quantitative Evaluation of Systems Quantitative Evaluation of Systems (QEST), pp. 209–219 (2012)
Google Scholar
Rao, K., Hafner, J.L., Golding, R.A.: Reliability for networked storage nodes. In: DSN 2006 International Conference on Dependable Systems and Networks, pp. 237–248 (2006)
Google Scholar
Borges, C.L., Falcão, D.M., Mello, J.C.O., Melo, A.C.: Composite reliability evaluation by sequential monte carlo simulation on parallel and distributed processing environments. IEEE Trans. Power Syst. 16(2), 203–209 (2001)
Article Google Scholar

Download references

Acknowledgments

This work is partially supported by NSF of China (61373018, 11301288, 11450110409), Program for New Century Excellent Talents in University (NCET130301), the Fundamental Research Funds for the Central Universities (65141021) and the Ph.D. Candidate Research Innovation Fund of Nankai University.

Author information

Authors and Affiliations

Nankai-Baidu Joint Lab, College of Computer and Control Engineering and College of Software, Nankai University, Tianjin, 300350, China
Jing Li, Mingze Li, Gang Wang, Xiaoguang Liu & Zhongwei Li
Qihoo 360 Technology Company, Beijing, 100621, China
Huijun Tang

Authors

Jing Li
View author publications
You can also search for this author in PubMed Google Scholar
Mingze Li
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhongwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Huijun Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Gang Wang or Zhongwei Li .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Guojun Wang
The University of Sydney, Sydney, New South Wales, Australia
Albert Zomaya
University of Murcia, Murcia, Murcia, Spain
Gregorio Martinez
Hunan University, Changsha, China
Kenli Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Li, M., Wang, G., Liu, X., Li, Z., Tang, H. (2015). Global Reliability Evaluation for Cloud Storage Systems with Proactive Fault Tolerance. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9531. Springer, Cham. https://doi.org/10.1007/978-3-319-27140-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-27140-8_14
Published: 16 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27139-2
Online ISBN: 978-3-319-27140-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics