Failure prediction of tasks in the cloud at an earlier stage: a solution based on domain information mining

Liu, Chunhong; Dai, Liping; Lai, Yi; Lai, Guibing; Mao, Wentao

doi:10.1007/s00607-020-00800-1

Failure prediction of tasks in the cloud at an earlier stage: a solution based on domain information mining

Published: 19 February 2020

Volume 102, pages 2001–2023, (2020)
Cite this article

Computing Aims and scope Submit manuscript

Chunhong Liu ORCID: orcid.org/0000-0001-7364-0568^1,2,
Liping Dai¹,
Yi Lai³,
Guibing Lai¹ &
…
Wentao Mao^1,2

459 Accesses
7 Citations
Explore all metrics

Abstract

In a large-scale data center, it is vital to precisely recognize the termination statuses of applications at an early stage. In recent years, many machine learning techniques have been applied to this issue, which is beneficial for optimizing the scheduling policy and improving the efficiency of resource utilization. However, if the application’s dynamic information is insufficient at the early stage, the generalization performance of the machine learning model will be lessened, and the prediction accuracy could be low. To overcome this problem, a novel failure prediction method that is based on the association relationships between similar jobs is proposed in this paper to jointly predict task’s termination statuses at an earlier stage. The similar jobs whose tasks have similar changing modes of consumed resources, an inherent structural correlation may exist, and the correlation information is significant for improving the prediction model’s generalization performance. First, a job clustering algorithm is proposed for identifying the jobs with higher similarity from jobs that have various numbers of tasks. Second, based on the job clustering results, the robust multi-task learning algorithm is introduced to effectively utilize the domain information among jobs (i.e. interactional relationship among jobs on the termination statuses of task). Experiments are conducted on a Google cluster workload traces dataset. The results show that the proposed method can realize higher prediction accuracy, lower misjudgment rate, and higher predictive stability than several state-of-the-art methods at 1/3 the running time of the tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review

Article 04 March 2022

Marta Fernandes, Juan Manuel Corchado & Goreti Marreiros

Deep learning models for predictive maintenance: a survey, comparison, challenges and prospects

Article 18 January 2022

Oscar Serradilla, Ekhi Zugasti, … Urko Zurutuza

A comprehensive survey of data mining

Article 06 February 2020

Manoj Kumar Gupta & Pravin Chandra

References

Zhang Q, Zhani MF, Boutaba R, Hellerstein JL (2014) Dynamic heterogeneity-aware resource provisioning in the cloud. IEEE Trans Cloud Comput 2(1):14–28
Article Google Scholar
Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J (2015) Large-scale cluster management at Google with Borg. In: Proceedings of the tenth European conference on computer systems (In EuroSys), Bordeaux, France, pp 1–17
Jassas M, Mahmoud QH (2018) Failure analysis and characterization of scheduling jobs in google cluster trace. In: IECON 2018-44th annual conference of the IEEE Industrial Electronics Society Washington, pp 3102–3107
Chen X, Lu CD, Pattabiraman K (2014) Failure analysis of jobs in compute clouds: a google cluster case study. In: Proceedings of IEEE international symposium on software reliability engineering workshops, Naples, Italy, pp 167–177
Liu HC, Han JJ, Shang Y, Liu C, Bo C, Chen J (2017) Predicting of job failure in compute cloud based on online extreme learning machine: a comparative study. IEEE Access 5(99):9359–9368
Article Google Scholar
Mao W, He L, Yan Y, Wang J (2017) Online sequential prediction of bearings imbalanced fault diagnosis by extreme learning machine. Mech Syst Signal Process 83:450–473
Article Google Scholar
Wang Z, Zhang M, Wang D, Song C, Liu M, Li J, Lou L, Liu Z (2017) Failure prediction using machine learning and time series in optical network. Opt Express 25(16):18553–18565
Article Google Scholar
Rosa A, Chen LY, Binder W (2017) Failure analysis and prediction for big-data systems. IEEE Trans Serv Comput 10(6):984–998
Article Google Scholar
Ganguly S, Consul A, Khan A, Bussone B, Miguel A (2016) A practical approach to hard disk failure prediction in cloud platforms: big data model for failure management in datacenters. In: Proceedings of IEEE second international conference on big data computing service and applications, Oxford, UK, pp 105–116
Padmakumari P, Umamakeswari A (2019) Task failure prediction using combine bagging ensemble (CBE) classification in cloud workflow. Wirel Pers Commun 107(1):23–40
Article Google Scholar
Chen X, Lu C, Pattabiramanb K (2014) Failure prediction of jobs in compute clouds: a google cluster case study. 2014 IEEE international symposium on software reliability engineering workshops. Naples, Italy, pp 341–346
Pei Y, Qi T, He J (2017) Multi-task function-on-function regression with co-grouping structured sparsity. In: Proceedings of ACM Sigkdd international conference on knowledge discovery and data mining, Halifax, NS, Canada, pp 1255–1264
Liu T, Tao D, Song M, Maybank S (2017) Algorithm-dependent generalization bounds for multi-task learning. IEEE Trans Pattern Anal 39(2):227–241
Article Google Scholar
Liu CH, Han JJ, Shang YL (2016) Predicting job failure in cloud cluster: based on SVM classification. J Beijing Univ Posts Telecommun 39(5):104–109
Google Scholar
Li Z, Tian Z, Mu Z, Zhang Z, Yue J (2018) Awareness of line-of-sight propagation for indoor localization using Hopkins statistic. IEEE Sens J 18(9):3864–3874
Article Google Scholar
Padmanaban S, Thiruvenkadam K (2018) Rapid brain tissue segmentation process by modified FCM algorithm with CUDA enabled GPU machine. Int J Imag Syst Technol 28(3):163–174
Article Google Scholar
Pan S, Shi W, He P, Ming H, Zhang X (2016) Novel approach to unsupervised change detection based on a robust semi-supervised FCM clustering algorithm. Remote Sens 8(3):264
Article Google Scholar
Chen J, Zhou J, Ye J (2011) Integrating low-rank and groupsparse structures for robust multi-task learning. In: Proceedings of ACM Sigkdd international conference on knowledge discovery and data mining, San Diego, California, USA, pp 42–50
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202
Article MathSciNet Google Scholar
Mao W, Mu X, Zheng Y, Yan G (2014) Leave-one-out cross-validationbased model selection for multi-input multi-output support vector machine. Neural Comput Appl 24(2):441–451
Article Google Scholar
Navarro JM, Parada GHA, Duenas JC (2014) System failure prediction through rare-events elastic-net logistic regression. In: Proceedings of international conference on artificial intelligence, Madrid, Spain, pp 120-125
Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, Montreal, Quebec, Canada, pp 339–348
Pong TK, Tseng P, Ji S, Ye J (2010) Trace norm regularization: reformulations, algorithms, and multi-task learning. SIAM J Optim 20(6):3465–3489
Article MathSciNet Google Scholar
Belghazi I, Rajeswar S, Baratin A, Hjelm R D, Courville A (2018) MINE: mutual information neural estimation. In: Proceedings of the 35th international conference on machine learning, Stockholm, Sweden
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, cluster analysis: basic concepts and methods, 3rd edn. Elsevier, Amsterdam, pp 443–495
Google Scholar
Zhou HB, Gao JT (2014) Automatic method for determining cluster number based on silhouette coefficient. Adv Mater Res 951:227–230
Article Google Scholar
Sitompul OS, Nababan EB (2018) Optimization model of K-means clustering using artificial neural networks to handle class imbalance problem. In: IOP conference series: materials science and engineering, vol . 288, no. 1, p 12075
Li X (2016) Parallel algorithms for hierarchical clustering and cluster validity. IEEE Trans Pattern Anal 12(11):1088–1092
Article Google Scholar
Pan L, Zhang B, Yang W, Ram R (2017) A sparse linear model and significance test for individual consumption prediction. IEEE Trans Power Syst 32(6):4489–4500
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. U1704158), China Postdoctoral Science Foundation Special Support (No. 2016T90944), Doctoral Research Project of Henan Normal University (No. 5101119170145), Science and Technology Research Project of Henan Province (No.172102210045).

Author information

Authors and Affiliations

School of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China
Chunhong Liu, Liping Dai, Guibing Lai & Wentao Mao
Engineering Lab of Intelligence Business and Internet of Things, Xinxiang, 453007, China
Chunhong Liu & Wentao Mao
College of Food Science and Nutritional Engineering, China Agricultural University, Beijing, 100089, China
Yi Lai

Authors

Chunhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Liping Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yi Lai
View author publications
You can also search for this author in PubMed Google Scholar
Guibing Lai
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Mao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chunhong Liu or Wentao Mao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Dai, L., Lai, Y. et al. Failure prediction of tasks in the cloud at an earlier stage: a solution based on domain information mining. Computing 102, 2001–2023 (2020). https://doi.org/10.1007/s00607-020-00800-1

Download citation

Received: 25 October 2019
Accepted: 14 February 2020
Published: 19 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00607-020-00800-1

Keywords

Mathematics Subject Classification

68T10

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Failure prediction of tasks in the cloud at an earlier stage: a solution based on domain information mining

Abstract

Access this article

Similar content being viewed by others

Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review

Deep learning models for predictive maintenance: a survey, comparison, challenges and prospects

A comprehensive survey of data mining

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Failure prediction of tasks in the cloud at an earlier stage: a solution based on domain information mining

Abstract

Access this article

Similar content being viewed by others

Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review

Deep learning models for predictive maintenance: a survey, comparison, challenges and prospects

A comprehensive survey of data mining

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation