Abstract
Datacenters are expanding in size and complexity to the point where anomaly detection and infrastructure monitoring become critical challenges. One potential strategy for dealing with the reliability of computational nodes in a datacenter is to identify cluster nodes or virtual machines exhibiting anomalous behavior. Throughout this paper, we introduce a novel clustering approach for analyzing cluster node behavior while running various workloads in a system based on resource usage details (CPU utilization, network events, etc.). The new clustering technique aims at boosting the efficiency of fuzzy clustering algorithms based on the maximum likelihood estimation (MLE) scheme. We propose the use of a recently developed object-to-group distance since it does not involve the computation of distances among all pairs of objects to assign the objects to the most appropriate group. The experimental findings under realistic settings demonstrate that the newly implemented algorithm outperforms many similar algorithms that have been used frequently in such tasks.
Similar content being viewed by others
Data availability
The data that support the findings of this paper are available from the corresponding author, Saloua El Motaki, upon reasonable request.
References
Abdelsalam M, Krishnan R, Sandhu R (2017) Clustering-based iaas cloud monitoring. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), pages 672–679. https://doi.org/10.1109/CLOUD.2017.90
Anthony A, Benjamin A, Jim B, Ann G, Sophia L, Steve M, Jeff O, Mahesh R, Joel S (2015) Toward rapid understanding of production hpc applications and systems. In: 2015 IEEE International Conference on Cluster Computing, pages 464–473. https://doi.org/10.1109/CLUSTER.2015.71
Amruthnath N, Gupta T (2018) A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance. In: 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), pages 355–361. https://doi.org/10.1109/IEA.2018.8387124
Bari MF, Boutaba R, Esteves R, Granville LZ, Podlesny M, Rabbani MG, Zhang Q, Zhani MF (2013) Data center network virtualization: a survey. IEEE Commun. Surv. Tutor. 15(2):909–928
Bashir M, Irfan A, Hassan U, Muhammad Y (2019) Failure prediction using machine learning in a virtualised hpc system and application. Clust. Comput. 22:471–485. https://doi.org/10.1007/s10586-019-02917-1 (ISSN 1573-7543)
Bhatele A, Mohror K, Langer SH, Isaacs KE (2013) There goes the neighborhood: performance degradation due to nearby jobs. In: SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 1–12. https://doi.org/10.1145/2503210.2503247
Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhya: The Indian Journal of Statistics (1933-1960), 7(4):401–406, ISSN 00364452. URL http://www.jstor.org/stable/25047882
Bi J, Yuan H, Zhang LB, Zhang J (2019) Sgw-scn: an integrated machine learning approach for workload forecasting in geo-distributed cloud data centers. Information Sciences, 481:57–68, ISSN 0020-0255. https://doi.org/10.1016/j.ins.2018.12.027. URL https://www.sciencedirect.com/science/article/pii/S0020025518309642
Brandt J, Chen F, De Sapio V, Gentile A, Mayo J, Pèbay P, Roe Di, Thompson D, Wong M (2010) Quantifying effectiveness of failure prediction and response in hpc systems: methodology and example. In: 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W), pages 2–7. https://doi.org/10.1109/DSNW.2010.5542629
Daradkeh T, Agarwal A, Zaman M, Goel N (2020) Dynamic k-means clustering of workload and cloud resource configuration for cloud elastic model. IEEE Access 8:219430–219446. https://doi.org/10.1109/ACCESS.2020.3042716
Egele M, Woo M, Chapman P, Brumley D (2014) Blanket execution: dynamic similarity testing for program binaries and components. In: 23rd \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 14), pages 303–317
El Motaki S, Yahyaouy A, Gualous H, Sabor J (2019) Gath-geva clustering algorithm for high performance computing (hpc) monitoring. In: 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), pages 1–6
El Motaki S, Yahyaouy A, Gualous H, Sabor J (2021) A new weighted fuzzy c-means clustering for workload monitoring in cloud datacenter platforms. Clust. Comput. 24(4):3367–3379. https://doi.org/10.1007/s10586-021-03331-2 (ISSN 1573-7543)
Gath I, Geva AB (1989) Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern. Anal. Mach. Intell. 11(7):773–780
Genuer R, Poggi J-M, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit. Lett. 31(14):2225–2236
Gustafson D, Kessel W (1978) Fuzzy clustering with a fuzzy covariance matrix. 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes, pages 761–766
Hubert L, Arabie P (1985) Comparing partitions. J. Classif. 2(1):193–218. https://doi.org/10.1007/BF01908075 (ISSN 1432-1343)
Hui Y (2018) A virtual machine anomaly detection system for cloud computing infrastructure. J. Supercomput. 21:6126–6134. https://doi.org/10.1007/s11227-018-2518-z
Ismaeel S, Miri A, Al-Khazraji A (2016) Energy-consumption clustering in cloud data centre. In: 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC), pages 1–6
Khan A, Yan X, Tao S, Anerousis N (2012) Workload characterization and prediction in the cloud: A multiple time series approach. In: 2012 IEEE Network Operations and Management Symposium, pages 1287–1294
Lorido-Botran T, Huerta S, Tomás L, Tordsson J, Sanz B (2017) An unsupervised approach to online noisy-neighbor detection in cloud data centers. Expert Systems with Applications, 89:188–204, ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2017.07.038. https://www.sciencedirect.com/science/article/pii/S0957417417305158. Accessed 6 June 2022
Nasibov EN, Ulutagay G (2009) Robustness of density-based clustering methods with various neighborhood relations. Fuzzy Sets Syst. 160(24):3601–3615
Pandeeswari N, Kumar G (2016) Anomaly detection system in cloud environment using fuzzy clustering based ann. Mob. Netw. Appl. 21:494–505. https://doi.org/10.1007/s11036-015-0644-x
Rugwiro U, Chunhua G (2017) Customization of virtual machine allocation policy using k-means clustering algorithm to minimize power consumption in data centers. In: Proceedings of the Second International Conference on Internet of Things, Data and Cloud Computing, New York, NY, USA, Association for Computing Machinery. ISBN 9781450347747. https://doi.org/10.1145/3018896.3018947
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 100(5):401–409
Sauvanaud C, Kaâniche M, Kanoun K, Lazri K, Da Silva SG (2018) Anomaly detection and diagnosis for cloud services: Practical experiments and lessons learned. Journal of Systems and Software, 139:84–106, ISSN 0164-1212. https://doi.org/10.1016/j.jss.2018.01.039. https://www.sciencedirect.com/science/article/pii/S0164121218300256. Accessed 2 June 2022
Shirazi N, Simpson S, Marnerides AK, Watson M, Mauthe A, Hutchison D (2014) Assessing the impact of intra-cloud live migration on anomaly detection. In: 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet), pages 52–57, https://doi.org/10.1109/CloudNet.2014.6968968
Snir M, Wisniewski R W, Abraham JA, Adve SV, Saurabh B, Pavan B, Jim B, Pradip B, Franck C, Bill C, Chien AA, Paul C, Debardeleben NA, Diniz PC, Christian E, Mattan E, Saverio F, Al G, Rinku G, Fred J, Sriram K, Sven L, Dean L, Subhasish M, Todd M, Rob S, Jon S, Eric Van H (2014) Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28(2):129–173. https://doi.org/10.1177/1094342014522573 (ISSN 1094-3420)
Tavakkol B, Jeong Myong K, Albin Susan L (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151. https://doi.org/10.1016/j.neucom.2016.12.007 (ISSN 0925-2312)
Tuncer O, Ates EC, Zhang Y, Turk A, Brandt JM, Leung VJ, Egele M, Coskun AK (2017) Diagnosing performance variations in hpc applications using machine learning. In: ISC. https://doi.org/10.1007/978-3-319-58667-0_19
Xiao X, Sun J, Yang J (2021) Operation and maintenance(o &m) for data center: an intelligent anomaly detection approach. Computer Communications, 178:141–152. ISSN 0140-3664. https://doi.org/10.1016/j.comcom.2021.06.030. https://www.sciencedirect.com/science/article/pii/S0140366421002541. Accessed 2 June 2022
Zhang X, Meng F, Chen P, Xu J (2016) Taskinsight: A fine-grained performance anomaly detection and problem locating system. In: 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pages 917–920. https://doi.org/10.1109/CLOUD.2016.0136
Acknowledgements
The experimental work was developed at the HPC-MARWAN computing cluster of the Mohammed V University In Rabat, Morocco.
Funding
The authors of this paper have not received any financial support for research, authorship and/or publication of this article.
Author information
Authors and Affiliations
Contributions
Conceptualization: SEM; Formal analysis and implementation: SEM; Writing—original draft preparation: SEM; Writing—review and editing: SEM and BH; Supervision: AY.
Corresponding author
Ethics declarations
Conflict of interest
The authors of this paper declare that they have no significant competing financial, professional, or personal interests that might have influenced the performance or presentation of this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: p-Friedman test
Appendix: p-Friedman test
The Friedman test is an extension of the Wilcoxon Signed Rank Test and the non-parametric equivalent of the 1-factor analysis of variance with repeated measures (Hubert and Arabie 1985). The Friedman test assumes the null hypothesis that k dependent variables belong to the same population. For the position parameter of a sample i by \(M_{i}\), we denote the null hypothesis by \(H_{0}\) and the alternative hypothesis by \(H_{a}\) by the following:
Given the Friedman null hypothesis, the expected summed ranks of each group are equal to \(\frac{n(k + 1)}{2}\). The Friedman test statistic is expressed as follows:
where \(R_{i}\) is the sum of the ranks for the sample i.
Rights and permissions
About this article
Cite this article
El Motaki, S., Hirchoua, B. & Yahyaouy, A. A new fuzzy MLE-clustering approach based on object-to-group probabilistic distance measure: from anomaly detection to multi-fault classification in datacenter computational nodes. J Ambient Intell Human Comput 14, 12697–12708 (2023). https://doi.org/10.1007/s12652-022-04205-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-04205-0