Skip to main content

Global Reliability Evaluation for Cloud Storage Systems with Proactive Fault Tolerance

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9531))

Abstract

In addition to the traditional reactive fault-tolerant technologies, such as erasure codes and replication, proactive fault tolerance can be used to improve the system’s reliability significantly. To the best of our knowledge, however, there is no previous publications on the reliability of such a cloud storage system except for those on RAID systems. In this paper, several Markov-based models are respectively proposed to evaluate the reliability of the cloud storage systems with/without proactive fault tolerance from the system perspective. Since proactive measure should be coupled with some reactive measure to ensure the systems reliability, the reliability model for such a system will be very intricate. To facilitate model building, we propose the basic state transition unit (BSTU), to describe the general pattern of state transition in the proactive cloud storage systems. BSTU serves as the generic “brick” for building the overall reliability model for such a system. Using our models, we demonstrate the benefits that proactive fault tolerance has on a system’s reliability, and also estimate the impacts of some system parameters on it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)

    MathSciNet  MATH  Google Scholar 

  2. Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: Proceedings of the 18th International Conference on Machine Learning, pp. 202–209 (2001)

    Google Scholar 

  3. Hughes, G.F., Murray, J.F., Kreutz-Delgado, K., Elkan, C.: Improved disk-drive failure warnings. IEEE Trans. Reliab. 51(3), 350–357 (2002)

    Article  Google Scholar 

  4. Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of the International Conference on Artificial Neural Networks (2003)

    Google Scholar 

  5. Zhao, Y., Liu, X., Gan, S., Zheng, W.: Predicting disk failures with HMM- and HSMM-based approaches. In: Proceedings of the 10th Industrial Conference on Advances in Data Mining: Applications and Theoretical Aspects, pp. 390–404 (2010)

    Google Scholar 

  6. Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: Proceedings of 29th IEEE Conference on Massive Storage Systems and Technologies (MSST), pp. 1–5. Long Beach, CA (2013)

    Google Scholar 

  7. Li, J., Ji, X., Jia, Y., Zhu, B., Wang, G., Li, Z., Liu, X.: Hard drive failure prediction using classification and regression trees. In: Proceedings of 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 383–394 (2014)

    Google Scholar 

  8. Eckart, B., Chen, X., He, X., Scott, S.L.: Failure prediction models for proactive fault tolerance within storage systems. In: Proceedings of IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems, pp. 1–8 (2008)

    Google Scholar 

  9. Zhang, Z., Lian, Q.: Reperasure: replication protocol using erasure-code in peer-to-peer storage network. In: Proceedings of 21st IEEE Symposium on Reliable Distributed Systems, pp. 330–335 (2002)

    Google Scholar 

  10. Peric, D., Bocek, T., Hecht, F.V., Hausheer, D., Stiller, B.: The design and evaluation of a distributed reliable file system. In: Proceedings of International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 348–353 (2009)

    Google Scholar 

  11. Chen, M., Chen, W., Liu, L., Zhang, Z.: An analytical framework and its applications for studying brick storage reliability. In: Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems, pp. 242–252. IEEE (2007)

    Google Scholar 

  12. Leslie, M., Davies, J., Huffman, T.: A comparison of replication strategies for reliable decentralised storage. J. Netw. 1(6), 36–44 (2006)

    Google Scholar 

  13. Venkatesan, V., Iliadis, I.: A general reliability model for data storage systems. In: Proceedings of Ninth International Conference on Quantitative Evaluation of Systems Quantitative Evaluation of Systems (QEST), pp. 209–219 (2012)

    Google Scholar 

  14. Rao, K., Hafner, J.L., Golding, R.A.: Reliability for networked storage nodes. In: DSN 2006 International Conference on Dependable Systems and Networks, pp. 237–248 (2006)

    Google Scholar 

  15. Borges, C.L., Falcão, D.M., Mello, J.C.O., Melo, A.C.: Composite reliability evaluation by sequential monte carlo simulation on parallel and distributed processing environments. IEEE Trans. Power Syst. 16(2), 203–209 (2001)

    Article  Google Scholar 

Download references

Acknowledgments

This work is partially supported by NSF of China (61373018, 11301288, 11450110409), Program for New Century Excellent Talents in University (NCET130301), the Fundamental Research Funds for the Central Universities (65141021) and the Ph.D. Candidate Research Innovation Fund of Nankai University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Gang Wang or Zhongwei Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, J., Li, M., Wang, G., Liu, X., Li, Z., Tang, H. (2015). Global Reliability Evaluation for Cloud Storage Systems with Proactive Fault Tolerance. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9531. Springer, Cham. https://doi.org/10.1007/978-3-319-27140-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27140-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27139-2

  • Online ISBN: 978-3-319-27140-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics