Skip to main content
Log in

A disk failure prediction model for multiple issues

一个针对多种问题的磁盘故障预测模型

  • Research Article
  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Disk failure prediction methods have been useful in handing a single issue, e.g., heterogeneous disks, model aging, and minority samples. However, because these issues often exist simultaneously, prediction models that can handle only one will result in prediction bias in reality. Existing disk failure prediction methods simply fuse various models, lacking discussion of training data preparation and learning patterns when facing multiple issues, although the solutions to different issues often conflict with each other. As a result, we first explore the training data preparation for multiple issues via a data partitioning pattern, i.e., our proposed multi-property data partitioning (MDP). Then, we consider learning with the partitioned data for multiple issues as learning multiple tasks, and introduce the model-agnostic meta-learning (MAML) framework to achieve the learning. Based on these improvements, we propose a novel disk failure prediction model named MDP-MAML. MDP addresses the challenges of uneven partitioning and difficulty in partitioning by time, and MAML addresses the challenge of learning with multiple domains and minor samples for multiple issues. In addition, MDP-MAML can assimilate emerging issues for learning and prediction. On the datasets reported by two real-world data centers, compared to state-of-the-art methods, MDP-MAML can improve the area under the curve (AUC) and false detection rate (FDR) from 0.85 to 0.89 and from 0.85 to 0.91, respectively, while reducing the false alarm rate (FAR) from 4.88% to 2.85%.

摘要

磁盘故障预测方法在单一问题上的解决方案十分成熟, 例如磁盘异构问题、 模型老化问题和小样本问题. 然而, 由于这些问题经常同时存在, 只能处理其中一个问题的模型在实际预测中存在偏差. 目前针对不同问题的解决方案经常相互冲突, 然而现有磁盘故障预测方法通常简单地融合各种模型, 缺乏在面对多个问题时对训练数据准备和学习模式的讨论. 为此, 提出一种多属性数据划分方法(MDP), 来探索针对多个问题的训练数据准备. 引入与模型无关的元学习算法(MAML), 对被划分的多个数据子集进行多任务学习. 基于这些改进, 提出一种名为MDP-MAML的磁盘故障预测模型. MDP解决了数据不均匀划分和按时间划分的挑战, 而MAML解决了针对多个问题小样本学习的问题. 此外, MDP-MAML能够适应新出现的问题并进行学习和预测. 在两个实际数据中心的数据集上, 与最先进方法相比, MDP-MAML将曲线下面积(AUC)从0.85提升至0.89, 将误检率(FDR)从0.85提升至0.91, 将误报率(FAR)从4.88%降低至2.85%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • Backblaze, 2018. The Backblaze Hard Drive Data and Stats. https://www.backblaze.com/b2/hard-drive-test-data.html [Accessed on Oct. 18, 2022].

  • Botezatu MM, Giurgiu I, Bogojeska J, et al., 2016. Predicting disk replacement towards reliable data centers. Proc 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.39–48. https://doi.org/10.1145/2939672.2939699

  • Buffelli D, Vandin F, 2022. Graph representation learning for multi-task settings: a meta-learning approach. Int Joint Conf on Neural Networks, p.1–8. https://doi.org/10.1109/IJCNN55064.2022.9892010

  • Finn C, Abbeel P, Levine S, 2017. Model-agnostic meta-learning for fast adaptation of deep networks. Proc 34th Int Conf on Machine Learning, p.1126–1135.

  • Frikha A, Krompaß D, Köpken HG, et al., 2021. Few-shot one-class classification via meta-learning. Proc AAAI Conf on Artificial Intelligence, p.7448–7456. https://doi.org/10.1609/aaai.v35i8.16913

  • Han SJ, Lee PPC, Xu F, et al., 2021. An in-depth study of correlated failures in production SSD-based data centers. Proc 19th USENIX Conf on File and Storage Technologies, p.417–429.

  • He HB, Garcia EA, 2009. Learning from imbalanced data. IEEE Trans Knowl Data Eng, 21(9):1263–1284. https://doi.org/10.1109/TKDE.2008.239

    Article  Google Scholar 

  • Hospedales T, Antoniou A, Micaelli P, et al., 2022. Meta-learning in neural networks: a survey. IEEE Trans Patt Anal Mach Intell, 44(9):5149–5169. https://doi.org/10.1109/TPAMI.2021.3079209

    Google Scholar 

  • Jiang TM, Zeng JF, Zhou K, et al., 2019. Lifelong disk failure prediction via GAN-based anomaly detection. IEEE 37th Int Conf on Computer Design, p.199–207. https://doi.org/10.1109/ICCD46524.2019.00033

  • Lake BM, Salakhutdinov R, Gross J, et al., 2011. One shot learning of simple visual concepts. Proc 33rd Annual Meeting of the Cognitive Science Society, p.2568–2573.

  • Ling CX, Huang J, Zhang H, et al., 2003. AUC: a statistically consistent and more discriminating measure than accuracy. Proc 18th Int Joint Conf on Artificial Intelligence, p.519–524.

  • Lu SD, Luo B, Patel T, et al., 2020. Making disk failure predictions smarter! Proc 18th USENIX Conf on File and Storage Technologies, p.151–167.

  • Luo C, Zhao P, Qiao B, et al., 2021. NTAM: neighborhood-temporal attention model for disk failure prediction in cloud platforms. Proc Web Conf, p.1181–1191. https://doi.org/10.1145/3442381.3449867

  • Mao YR, Wang ZK, Liu WW, et al., 2022. MetaWeighting: learning to weight tasks in multi-task learning. Findings of the Association for Computational Linguistics: ACL 2022, p.3436–3448. https://doi.org/10.18653/v1/2022.findings-acl.271

  • Nichol A, Achiam J, Schulman J, 2018. On first-order meta-learning algorithms. https://arxiv.org/abs/1803.02999

  • Pereira FLF, dos Santos Lima FD, de Moura Leite LG, et al., 2017. Transfer learning for Bayesian networks with application on hard disk drives failure prediction. Brazilian Conf on Intelligent Systems, p.228–233. https://doi.org/10.1109/BRACIS.2017.64

  • Pinheiro E, Weber WD, Barroso LA, 2007. Failure trends in a large disk drive population. Proc 5th USENIX Conf on File and Storage Technologies, p.17–28.

  • Rincón CA, Pâris JF, Vilalta R, et al., 2017. Disk failure prediction in heterogeneous environments. Int Symp on Performance Evaluation of Computer and Telecommunication Systems, p.1–7. https://doi.org/10.23919/SPECTS.2017.8046776

  • Shen J, Ren YJ, Wan J, et al., 2021. Hard disk drive failure prediction for mobile edge computing based on an LSTM recurrent neural network. Mob Inform Syst, 2021:1–12. https://doi.org/10.1155/2021/8878364

    Google Scholar 

  • Sun XY, Chakrabarty K, Huang RR, et al., 2019. Systemlevel hardware failure prediction using deep learning. Proc 56th ACM/IEEE Design Automation Conf, p.1–6. https://doi.org/10.1145/3316781.3317918

  • Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000–6010.

  • Wang JD, Lan CL, Liu C, et al., 2021. Generalizing to unseen domains: a survey on domain generalization. Proc 30th Int Joint Conf on Artificial Intelligence, p.4627–4635. https://doi.org/10.24963/ijcai.2021/628

  • Xiao J, Xiong Z, Wu S, et al., 2018. Disk failure prediction in data centers via online learning. Proc 47th Int Conf on Parallel Processing, p.1–10. https://doi.org/10.1145/3225058.3225106

  • Xie YW, Feng D, Wang F, et al., 2018. OME: an optimized modeling engine for disk failure prediction in heterogeneous datacenter. IEEE 36th Int Conf on Computer Design, p.561–564. https://doi.org/10.1109/ICCD.2018.00089

  • Zhang J, Huang P, Zhou K, et al., 2020a. HDDse: enabling high-dimensional disk state embedding for generic failure detection system of heterogeneous disks in large data centers. Proc USENIX Annual Technical Conf, p.111–126.

  • Zhang J, Zhou K, Huang P, et al., 2020b. Minority disk failure prediction based on transfer learning in large data centers of heterogeneous disk systems. IEEE Trans Parall Distrib Syst, 31(9):2155–2169. https://doi.org/10.1109/TPDS.2020.2985346

    Article  Google Scholar 

  • Zhang YQ, Hao WW, Niu B, et al., 2023. Multi-view feature-based SSD failure prediction: what, when, and why. Proc 21st USENIX Conf on File and Storage Technologies, p.409–424.

  • Zhou Y, Wang F, Feng D, 2022. A disk failure prediction method based on active semi-supervised learning. ACM Trans Stor, 18(4):1–33. https://doi.org/10.1145/3523699

    Article  Google Scholar 

  • Zhu BP, Wang G, Liu XG, et al., 2013. Proactive drive failure prediction for large scale storage systems. IEEE 29th Symp on Mass Storage Systems and Technologies, p.1–5. https://doi.org/10.1109/MSST.2013.6558427

Download references

Author information

Authors and Affiliations

Authors

Contributions

Yunchuan GUAN designed the research and conducted the experiments. Yunchuan GUAN and Yu LIU drafted the paper. Ke ZHOU helped organize the paper. Qiang LI, Tuanjie WANG, and Hui LI provided the data and funded the research. Yunchuan GUAN revised and finalized the paper.

Corresponding author

Correspondence to Yu Liu  (刘渝).

Ethics declarations

Yunchuan GUAN, Yu LIU, Ke ZHOU, Qiang LI, Tuanjie WANG, and Hui LI declare that they have no conflict of interest.

Additional information

Project supported by the National Natural Science Foundation of China (No. 61902135) and the Shandong Provincial Natural Science Foundation, China (No. ZR2019LZH003)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guan, Y., Liu, Y., Zhou, K. et al. A disk failure prediction model for multiple issues. Front Inform Technol Electron Eng 24, 964–979 (2023). https://doi.org/10.1631/FITEE.2200488

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.2200488

Key words

关键词

CLC number

Navigation