skip to main content
10.1145/3472456.3472490acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

ASLDP: An Active Semi-supervised Learning method for Disk Failure Prediction

Published: 05 October 2021 Publication History

Abstract

Disk failure has always been a major problem for data centers, leading to data loss. Current research works used supervised learning to offline training through a large number of labeled samples. However, these offline methods are no longer suitable for disk failure prediction tasks in the current big data environment. Behind this explosive amount of data, most methods do not take into account the label values used in the model training phase may not be easy to obtain, or the obtained label values are not completely accurate. These problems further restrict the development of supervised learning and offline modeling in disk failure prediction. In this paper, ASLDP, a novel disk failure prediction method is proposed, which uses active learning and semi-supervised learning. According to the characteristics of data in the disk life cycle, ASLDP carries out active learning for those clear labeled samples, which selects valuable samples and eliminates redundancy. For those samples that are unclearly labeled or unlabeled, ASLDP combines with semi-supervised learning for pre-labeled, and enhances the generalization ability by active learning. The results on three realistic datasets demonstrate that ASLDP achieves stable failure detection rates of 80-85% with low false alarm rates, compared to current online learning methods. Furthermore, ASLDP can overcome the problems of the sample label missing and data redundancy in the massive data environment, compared to current offline learning methods.

References

[1]
backblaze. 2014. Hard Drive SMART Stats. https://www.backblaze.com/blog/hard-drive-smart-stats/.
[2]
backblaze. 2016-2020. Raw Hard Drive Test Data. https://www.backblaze.com/b2/hard-drive-test-data.html.
[3]
Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory. 92–100.
[4]
Mirela Madalina Botezatu, Ioana Giurgiu, Jasmina Bogojeska, and Dorothea Wiesmann. 2016. Predicting disk replacement towards reliable data centers. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 39–48.
[5]
Sheng-Jun CHENG, Jia-Feng LIU, Qing-Cheng HUANG, and Xiang-Long TANG. 2013. Conditional Value-based Co-training. Acta Automatica Sinica(2013), 10.
[6]
Fernando Dione dos Santos Lima, Gabriel Maia Rocha Amaral, Lucas Goncalves de Moura Leite, João Paulo Pordeus Gomes, and Javam de Castro Machado. 2017. Predicting failures in hard drives with lstm networks. In 2017 Brazilian Conference on Intelligent Systems (BRACIS). IEEE, 222–227.
[7]
Shujie Han, Patrick PC Lee, Zhirong Shen, Cheng He, Yi Liu, and Tao Huang. 2020. Toward adaptive disk failure prediction via stream mining. In Proceedings of IEEE ICDCS.
[8]
Milad Hashemi, Kevin Swersky, Jamie Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. 2018. Learning memory access patterns. In International Conference on Machine Learning. PMLR, 1919–1928.
[9]
Gordon F Hughes, Joseph F Murray, Kenneth Kreutz-Delgado, and Charles Elkan. 2002. Improved disk-drive failure warnings. IEEE transactions on reliability 51, 3 (2002), 350–357.
[10]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data. 489–504.
[11]
Jing Li, Xinpu Ji, Yuhan Jia, Bingpeng Zhu, Gang Wang, Zhongwei Li, and Xiaoguang Liu. 2014. Hard drive failure prediction using classification and regression trees. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 383–394.
[12]
Jing Li, Rebecca J Stones, Gang Wang, Xiaoguang Liu, Zhongwei Li, and Ming Xu. 2017. Hard drive failure prediction using decision trees. Reliability Engineering & System Safety 164 (2017), 55–65.
[13]
Sidi Lu, Bing Luo, Tirthak Patel, Yongtao Yao, Devesh Tiwari, and Weisong Shi. 2020. Making disk failure predictions smarter!. In 18th {USENIX} Conference on File and Storage Technologies ({FAST} 20). 151–167.
[14]
Joseph F Murray, Gordon F Hughes, and Kenneth Kreutz-Delgado. 2003. Hard drive failure prediction using non-parametric statistical methods. In Proceedings of ICANN/ICONIP.
[15]
Teerat Pitakrat, Andre Van Hoorn, and Lars Grunske. 2013. A comparison of machine learning algorithms for proactive hard disk drive failure detection. In Proceedings of the 4th international ACM Sigsoft symposium on Architecting critical systems. 1–10.
[16]
Bianca Schroeder and Garth A Gibson. 2007. Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you?ACM Transactions on Storage (TOS) 3, 3 (2007), 8–es.
[17]
Burr Settles. 2009. Active learning literature survey. (2009).
[18]
Xiaoyi Sun, Krishnendu Chakrabarty, Ruirui Huang, Yiquan Chen, Bing Zhao, Hai Cao, Yinhe Han, Xiaoyao Liang, and Li Jiang. 2019. System-level hardware failure prediction using deep learning. In 2019 56th ACM/IEEE design automation conference (DAC). IEEE, 1–6.
[19]
Yu Wang, Eden WM Ma, Tommy WS Chow, and Kwok-Leung Tsui. 2013. A two-step parametric method for failure prediction in hard disk drives. IEEE Transactions on industrial informatics 10, 1 (2013), 419–430.
[20]
Yu Wang, Qiang Miao, and Michael Pecht. 2011. Health monitoring of hard disk drive based on Mahalanobis distance. In 2011 Prognostics and System Health Managment Confernece. IEEE, 1–8.
[21]
Jiang Xiao, Zhuang Xiong, Song Wu, Yusheng Yi, Hai Jin, and Kan Hu. 2018. Disk failure prediction in data centers via online learning. In Proceedings of the 47th International Conference on Parallel Processing. 1–10.
[22]
Yanwen Xie, Dan Feng, Fang Wang, Xuehai Tang, Jizhong Han, and Xinyan Zhang. 2019. DFPE: explaining predictive models for disk failure prediction. In 2019 35th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, 193–204.
[23]
Yanwen Xie, Dan Feng, Fang Wang, Xinyan Zhang, Jizhong Han, and Xuehai Tang. 2018. OME: An Optimized Modeling Engine for Disk Failure Prediction in Heterogeneous Datacenter. In 2018 IEEE 36th International Conference on Computer Design (ICCD). IEEE, 561–564.
[24]
Chang Xu, Gang Wang, Xiaoguang Liu, Dongdong Guo, and Tie-Yan Liu. 2016. Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65, 11 (2016), 3502–3508.
[25]
Yong Xu, Kaixin Sui, Randolph Yao, Hongyu Zhang, Qingwei Lin, Yingnong Dang, Peng Li, Keceng Jiang, Wenchi Zhang, Jian-Guang Lou, 2018. Improving service availability of cloud systems by predicting disk error. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18). 481–494.
[26]
Lei Yang, Hong Wu, Tieying Zhang, Xuntao Cheng, Feifei Li, Lei Zou, Yujie Wang, Rongyao Chen, Jianying Wang, and Gui Huang. 2020. Leaper: a learned prefetcher for cache invalidation in LSM-tree based storage engines. Proceedings of the VLDB Endowment 13, 12 (2020), 1976–1989.
[27]
Ji Zhang, Ping Huang, Ke Zhou, Ming Xie, and Sebastian Schelter. 2020. HDDse: Enabling High-Dimensional Disk State Embedding for Generic Failure Detection System of Heterogeneous Disks in Large Data Centers. In 2020 {USENIX} Annual Technical Conference ({USENIX}{ATC} 20). 111–126.
[28]
Ji Zhang, Ke Zhou, Ping Huang, Xubin He, Zhili Xiao, Bin Cheng, Yongguang Ji, and Yinhu Wang. 2019. Transfer learning based failure prediction for minority disks in large data centers of heterogeneous disk systems. In Proceedings of the 48th International Conference on Parallel Processing. 1–10.
[29]
Yang Zhou and Kui Xiao. 2019. Extracting prerequisite relations among concepts in wikipedia. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
[30]
Yang Zhou, Kui Xiao, and Yan Zhang. 2020. An Ensemble Learning Approach for Extracting Concept Prerequisite Relations from Wikipedia. In 2020 16th International Conference on Mobility, Sensing and Networking (MSN). IEEE, 642–647.
[31]
Xiaojin Jerry Zhu. 2005. Semi-supervised learning literature survey. (2005).
[32]
Marwin Züfle, Christian Krupitzer, Florian Erhard, Johannes Grohmann, and Samuel Kounev. 2020. To fail or not to fail: predicting hard disk drive failure time windows. In International Conference on Measurement, Modelling and Evaluation of Computing Systems. Springer, 19–36.

Cited By

View all
  • (2024)An Efficient Deep Reinforcement Learning-Based Automatic Cache Replacement Policy in Cloud Block Storage SystemsIEEE Transactions on Computers10.1109/TC.2023.332562573:1(164-177)Online publication date: 1-Jan-2024
  • (2022)A Disk Failure Prediction Method Based on Active Semi-supervised LearningACM Transactions on Storage10.1145/352369918:4(1-33)Online publication date: 12-Nov-2022
  • (2022)A Multi-Factor Adaptive Multi-Level Cooperative Replacement Policy in Block Storage Systems2022 IEEE 40th International Conference on Computer Design (ICCD)10.1109/ICCD56317.2022.00020(67-75)Online publication date: Oct-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
August 2021
927 pages
ISBN:9781450390682
DOI:10.1145/3472456
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Active Learning
  2. Disk Failure Prediction
  3. Machine Learning
  4. Semi-supervised Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP 2021

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Efficient Deep Reinforcement Learning-Based Automatic Cache Replacement Policy in Cloud Block Storage SystemsIEEE Transactions on Computers10.1109/TC.2023.332562573:1(164-177)Online publication date: 1-Jan-2024
  • (2022)A Disk Failure Prediction Method Based on Active Semi-supervised LearningACM Transactions on Storage10.1145/352369918:4(1-33)Online publication date: 12-Nov-2022
  • (2022)A Multi-Factor Adaptive Multi-Level Cooperative Replacement Policy in Block Storage Systems2022 IEEE 40th International Conference on Computer Design (ICCD)10.1109/ICCD56317.2022.00020(67-75)Online publication date: Oct-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media