Abstract
In order to reduce the economic losses caused by hard disk failures, researchers have proposed various statistical and machine learning methods based on Self-Monitoring Analysis and Reporting Technology (SMART) attributes. Predicting hard drive health using SMART attributes, as proposed by previous methods, is effective for adopting different passive fault tolerance mechanisms in advance. Despite the effectiveness of these methods, there are still significant limitations. Specifically, these methods define health status according to the remaining time before it breaks down. However, they ignore changes in SMART features that reflect deteriorating disk health. In this paper, we propose an N-dimensional similarity metric to evaluate the health of HDDs, which acts on both SMART attributes and time-to-failure of HDDs. In addition, we use hypothesis test to eliminate abnormal data and propose a Bidirectional LSTM (Bi-LSTM) based model with weighted categorical cross-entropy loss. Experiments on the Backblaze and Baidu datasets show that our method obtains reasonably accurate health status assessments and outperforms previous methods. Code is available at https://github.com/su26225/HDD-Health-Status.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sateesh Babu, G., Zhao, P., Li, X.-L.: Deep convolutional neural network based regression approach for estimation of remaining useful life. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9642, pp. 214–228. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32025-0_14
Bai, A., Chen, M., Peng, S., Han, G., Yang, Z.: Attention-based bidirectional LSTM with differential features for disk RUL prediction. In: IEEE International Conference on Electronic Information and Communication Technology, ICEICT 2022, pp. 684–689 (2022)
Basak, S., Sengupta, S., Dubey, A.: Mechanisms for integrated feature normalization and remaining useful life estimation using LSTMs applied to hard-disks. In: IEEE International Conference on Smart Computing, SMARTCOMP 2019, pp. 208–216 (2019)
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Conference on Computational Learning Theory, COLT 1992, pp. 144–152 (1992)
Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48 (2016)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: 6th Symposium on Operating System Design and Implementation (OSDI 2004), pp. 137–150 (2004)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings (2015)
Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pp. 202–209 (2001)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, S., Fu, S., Zhang, Q., Shi, W.: Characterizing disk failures with quantified disk degradation signatures: An early experience. In: 2015 IEEE International Symposium on Workload Characterization, IISWC 2015, pp. 150–159 (2015)
Jiang, T., Huang, P., Zhou, K.: Scrub unleveling: achieving high data reliability at low scrubbing cost. In: Design, Automation & Test in Europe Conference & Exhibition, DATE 2019, pp. 1403–1408 (2019)
Kadekodi, S., Rashmi, K.V., Ganger, G.R.: Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity. In: 17th USENIX Conference on File and Storage Technologies, FAST 2019, pp. 345–358 (2019)
Klein, A.: Backblaze drive stats for 2021. https://www.backblaze.com/blog/backblaze-drive-stats-for-2021/
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Lu, S., Luo, B., Patel, T., Yao, Y., Tiwari, D., Shi, W.: Making disk failure predictions smarter! In: 18th USENIX Conference on File and Storage Technologies, FAST 2020, pp. 151–167 (2020)
Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)
Pang, S., Jia, Y., Stones, R.J., Wang, G., Liu, X.: A combined Bayesian network method for predicting drive failure times from SMART attributes. In: 2016 International Joint Conference on Neural Networks, IJCNN 2016, pp. 4850–4856 (2016)
Santo, A.D., Galli, A., Gravina, M., Moscato, V., Sperlì, G.: Deep learning for HDD health assessment: an application based on LSTM. IEEE Trans. Comput. 71(1), 69–80 (2022)
dos Santos Lima, F.D., Amaral, G.M.R., de Moura Leite, L.G., Gomes, J.P.P., de Castro Machado, J.: Predicting failures in hard drives with LSTM networks. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, pp. 222–227 (2017)
Shi, Y., Li, J., Li, Z.: Gradient boosting with piece-wise linear regression trees. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, pp. 3432–3438 (2019)
Zhang, J., Wang, J., He, L., Li, Z., Yu, P.S.: Layerwise perturbation-based adversarial training for hard drive health degree prediction. In: IEEE International Conference on Data Mining, ICDM 2018, pp. 1428–1433 (2018)
Acknowledgements
This work is supported by Open Fund of Intelligent Terminal Key Laboratory of Sichuan Province (No. SCTLAB-2007).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Su, B., Man, X., Xu, H., Zhou, X., Shao, J. (2024). Health Status Assessment for HDDs Based on Bi-LSTM and N-Dimensional Similarity Metric. In: Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., Yang, Z. (eds) Databases Theory and Applications. ADC 2023. Lecture Notes in Computer Science, vol 14386. Springer, Cham. https://doi.org/10.1007/978-3-031-47843-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-47843-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47842-0
Online ISBN: 978-3-031-47843-7
eBook Packages: Computer ScienceComputer Science (R0)