Skip to main content

Health Status Assessment for HDDs Based on Bi-LSTM and N-Dimensional Similarity Metric

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14386))

Included in the following conference series:

  • 272 Accesses

Abstract

In order to reduce the economic losses caused by hard disk failures, researchers have proposed various statistical and machine learning methods based on Self-Monitoring Analysis and Reporting Technology (SMART) attributes. Predicting hard drive health using SMART attributes, as proposed by previous methods, is effective for adopting different passive fault tolerance mechanisms in advance. Despite the effectiveness of these methods, there are still significant limitations. Specifically, these methods define health status according to the remaining time before it breaks down. However, they ignore changes in SMART features that reflect deteriorating disk health. In this paper, we propose an N-dimensional similarity metric to evaluate the health of HDDs, which acts on both SMART attributes and time-to-failure of HDDs. In addition, we use hypothesis test to eliminate abnormal data and propose a Bidirectional LSTM (Bi-LSTM) based model with weighted categorical cross-entropy loss. Experiments on the Backblaze and Baidu datasets show that our method obtains reasonably accurate health status assessments and outperforms previous methods. Code is available at https://github.com/su26225/HDD-Health-Status.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/68-95-99.7_rule.

References

  1. Sateesh Babu, G., Zhao, P., Li, X.-L.: Deep convolutional neural network based regression approach for estimation of remaining useful life. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9642, pp. 214–228. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32025-0_14

    Chapter  Google Scholar 

  2. Bai, A., Chen, M., Peng, S., Han, G., Yang, Z.: Attention-based bidirectional LSTM with differential features for disk RUL prediction. In: IEEE International Conference on Electronic Information and Communication Technology, ICEICT 2022, pp. 684–689 (2022)

    Google Scholar 

  3. Basak, S., Sengupta, S., Dubey, A.: Mechanisms for integrated feature normalization and remaining useful life estimation using LSTMs applied to hard-disks. In: IEEE International Conference on Smart Computing, SMARTCOMP 2019, pp. 208–216 (2019)

    Google Scholar 

  4. Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Conference on Computational Learning Theory, COLT 1992, pp. 144–152 (1992)

    Google Scholar 

  5. Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48 (2016)

    Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: 6th Symposium on Operating System Design and Implementation (OSDI 2004), pp. 137–150 (2004)

    Google Scholar 

  7. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings (2015)

    Google Scholar 

  8. Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pp. 202–209 (2001)

    Google Scholar 

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  10. Huang, S., Fu, S., Zhang, Q., Shi, W.: Characterizing disk failures with quantified disk degradation signatures: An early experience. In: 2015 IEEE International Symposium on Workload Characterization, IISWC 2015, pp. 150–159 (2015)

    Google Scholar 

  11. Jiang, T., Huang, P., Zhou, K.: Scrub unleveling: achieving high data reliability at low scrubbing cost. In: Design, Automation & Test in Europe Conference & Exhibition, DATE 2019, pp. 1403–1408 (2019)

    Google Scholar 

  12. Kadekodi, S., Rashmi, K.V., Ganger, G.R.: Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity. In: 17th USENIX Conference on File and Storage Technologies, FAST 2019, pp. 345–358 (2019)

    Google Scholar 

  13. Klein, A.: Backblaze drive stats for 2021. https://www.backblaze.com/blog/backblaze-drive-stats-for-2021/

  14. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  15. Lu, S., Luo, B., Patel, T., Yao, Y., Tiwari, D., Shi, W.: Making disk failure predictions smarter! In: 18th USENIX Conference on File and Storage Technologies, FAST 2020, pp. 151–167 (2020)

    Google Scholar 

  16. Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)

    MathSciNet  MATH  Google Scholar 

  17. Pang, S., Jia, Y., Stones, R.J., Wang, G., Liu, X.: A combined Bayesian network method for predicting drive failure times from SMART attributes. In: 2016 International Joint Conference on Neural Networks, IJCNN 2016, pp. 4850–4856 (2016)

    Google Scholar 

  18. Santo, A.D., Galli, A., Gravina, M., Moscato, V., Sperlì, G.: Deep learning for HDD health assessment: an application based on LSTM. IEEE Trans. Comput. 71(1), 69–80 (2022)

    Article  MATH  Google Scholar 

  19. dos Santos Lima, F.D., Amaral, G.M.R., de Moura Leite, L.G., Gomes, J.P.P., de Castro Machado, J.: Predicting failures in hard drives with LSTM networks. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, pp. 222–227 (2017)

    Google Scholar 

  20. Shi, Y., Li, J., Li, Z.: Gradient boosting with piece-wise linear regression trees. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, pp. 3432–3438 (2019)

    Google Scholar 

  21. Zhang, J., Wang, J., He, L., Li, Z., Yu, P.S.: Layerwise perturbation-based adversarial training for hard drive health degree prediction. In: IEEE International Conference on Data Mining, ICDM 2018, pp. 1428–1433 (2018)

    Google Scholar 

Download references

Acknowledgements

This work is supported by Open Fund of Intelligent Terminal Key Laboratory of Sichuan Province (No. SCTLAB-2007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Shao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Su, B., Man, X., Xu, H., Zhou, X., Shao, J. (2024). Health Status Assessment for HDDs Based on Bi-LSTM and N-Dimensional Similarity Metric. In: Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., Yang, Z. (eds) Databases Theory and Applications. ADC 2023. Lecture Notes in Computer Science, vol 14386. Springer, Cham. https://doi.org/10.1007/978-3-031-47843-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47843-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47842-0

  • Online ISBN: 978-3-031-47843-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics