A new fuzzy MLE-clustering approach based on object-to-group probabilistic distance measure: from anomaly detection to multi-fault classification in datacenter computational nodes

El Motaki, Saloua; Hirchoua, Badr; Yahyaouy, Ali

doi:10.1007/s12652-022-04205-0

A new fuzzy MLE-clustering approach based on object-to-group probabilistic distance measure: from anomaly detection to multi-fault classification in datacenter computational nodes

Original Research
Published: 02 July 2022

Volume 14, pages 12697–12708, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

124 Accesses
Explore all metrics

Abstract

Datacenters are expanding in size and complexity to the point where anomaly detection and infrastructure monitoring become critical challenges. One potential strategy for dealing with the reliability of computational nodes in a datacenter is to identify cluster nodes or virtual machines exhibiting anomalous behavior. Throughout this paper, we introduce a novel clustering approach for analyzing cluster node behavior while running various workloads in a system based on resource usage details (CPU utilization, network events, etc.). The new clustering technique aims at boosting the efficiency of fuzzy clustering algorithms based on the maximum likelihood estimation (MLE) scheme. We propose the use of a recently developed object-to-group distance since it does not involve the computation of distances among all pairs of objects to assign the objects to the most appropriate group. The experimental findings under realistic settings demonstrate that the newly implemented algorithm outperforms many similar algorithms that have been used frequently in such tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

T2RFIS: type-2 regression-based fuzzy inference system

Article Open access 20 July 2023

A comprehensive survey of anomaly detection techniques for high dimensional big data

Article Open access 02 July 2020

The big data system, components, tools, and technologies: a survey

Article 18 September 2018

Data availability

The data that support the findings of this paper are available from the corresponding author, Saloua El Motaki, upon reasonable request.

References

Abdelsalam M, Krishnan R, Sandhu R (2017) Clustering-based iaas cloud monitoring. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), pages 672–679. https://doi.org/10.1109/CLOUD.2017.90
Anthony A, Benjamin A, Jim B, Ann G, Sophia L, Steve M, Jeff O, Mahesh R, Joel S (2015) Toward rapid understanding of production hpc applications and systems. In: 2015 IEEE International Conference on Cluster Computing, pages 464–473. https://doi.org/10.1109/CLUSTER.2015.71
Amruthnath N, Gupta T (2018) A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance. In: 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), pages 355–361. https://doi.org/10.1109/IEA.2018.8387124
Bari MF, Boutaba R, Esteves R, Granville LZ, Podlesny M, Rabbani MG, Zhang Q, Zhani MF (2013) Data center network virtualization: a survey. IEEE Commun. Surv. Tutor. 15(2):909–928
Article Google Scholar
Bashir M, Irfan A, Hassan U, Muhammad Y (2019) Failure prediction using machine learning in a virtualised hpc system and application. Clust. Comput. 22:471–485. https://doi.org/10.1007/s10586-019-02917-1 (ISSN 1573-7543)
Article Google Scholar
Bhatele A, Mohror K, Langer SH, Isaacs KE (2013) There goes the neighborhood: performance degradation due to nearby jobs. In: SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 1–12. https://doi.org/10.1145/2503210.2503247
Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhya: The Indian Journal of Statistics (1933-1960), 7(4):401–406, ISSN 00364452. URL http://www.jstor.org/stable/25047882
Bi J, Yuan H, Zhang LB, Zhang J (2019) Sgw-scn: an integrated machine learning approach for workload forecasting in geo-distributed cloud data centers. Information Sciences, 481:57–68, ISSN 0020-0255. https://doi.org/10.1016/j.ins.2018.12.027. URL https://www.sciencedirect.com/science/article/pii/S0020025518309642
Brandt J, Chen F, De Sapio V, Gentile A, Mayo J, Pèbay P, Roe Di, Thompson D, Wong M (2010) Quantifying effectiveness of failure prediction and response in hpc systems: methodology and example. In: 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W), pages 2–7. https://doi.org/10.1109/DSNW.2010.5542629
Daradkeh T, Agarwal A, Zaman M, Goel N (2020) Dynamic k-means clustering of workload and cloud resource configuration for cloud elastic model. IEEE Access 8:219430–219446. https://doi.org/10.1109/ACCESS.2020.3042716
Article Google Scholar
Egele M, Woo M, Chapman P, Brumley D (2014) Blanket execution: dynamic similarity testing for program binaries and components. In: 23rd $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 14), pages 303–317
El Motaki S, Yahyaouy A, Gualous H, Sabor J (2019) Gath-geva clustering algorithm for high performance computing (hpc) monitoring. In: 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), pages 1–6
El Motaki S, Yahyaouy A, Gualous H, Sabor J (2021) A new weighted fuzzy c-means clustering for workload monitoring in cloud datacenter platforms. Clust. Comput. 24(4):3367–3379. https://doi.org/10.1007/s10586-021-03331-2 (ISSN 1573-7543)
Article Google Scholar
Gath I, Geva AB (1989) Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern. Anal. Mach. Intell. 11(7):773–780
Article MATH Google Scholar
Genuer R, Poggi J-M, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit. Lett. 31(14):2225–2236
Article Google Scholar
Gustafson D, Kessel W (1978) Fuzzy clustering with a fuzzy covariance matrix. 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes, pages 761–766
Hubert L, Arabie P (1985) Comparing partitions. J. Classif. 2(1):193–218. https://doi.org/10.1007/BF01908075 (ISSN 1432-1343)
Article MATH Google Scholar
Hui Y (2018) A virtual machine anomaly detection system for cloud computing infrastructure. J. Supercomput. 21:6126–6134. https://doi.org/10.1007/s11227-018-2518-z
Article Google Scholar
Ismaeel S, Miri A, Al-Khazraji A (2016) Energy-consumption clustering in cloud data centre. In: 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC), pages 1–6
Khan A, Yan X, Tao S, Anerousis N (2012) Workload characterization and prediction in the cloud: A multiple time series approach. In: 2012 IEEE Network Operations and Management Symposium, pages 1287–1294
Lorido-Botran T, Huerta S, Tomás L, Tordsson J, Sanz B (2017) An unsupervised approach to online noisy-neighbor detection in cloud data centers. Expert Systems with Applications, 89:188–204, ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2017.07.038. https://www.sciencedirect.com/science/article/pii/S0957417417305158. Accessed 6 June 2022
Nasibov EN, Ulutagay G (2009) Robustness of density-based clustering methods with various neighborhood relations. Fuzzy Sets Syst. 160(24):3601–3615
Article MathSciNet MATH Google Scholar
Pandeeswari N, Kumar G (2016) Anomaly detection system in cloud environment using fuzzy clustering based ann. Mob. Netw. Appl. 21:494–505. https://doi.org/10.1007/s11036-015-0644-x
Article Google Scholar
Rugwiro U, Chunhua G (2017) Customization of virtual machine allocation policy using k-means clustering algorithm to minimize power consumption in data centers. In: Proceedings of the Second International Conference on Internet of Things, Data and Cloud Computing, New York, NY, USA, Association for Computing Machinery. ISBN 9781450347747. https://doi.org/10.1145/3018896.3018947
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 100(5):401–409
Article Google Scholar
Sauvanaud C, Kaâniche M, Kanoun K, Lazri K, Da Silva SG (2018) Anomaly detection and diagnosis for cloud services: Practical experiments and lessons learned. Journal of Systems and Software, 139:84–106, ISSN 0164-1212. https://doi.org/10.1016/j.jss.2018.01.039. https://www.sciencedirect.com/science/article/pii/S0164121218300256. Accessed 2 June 2022
Shirazi N, Simpson S, Marnerides AK, Watson M, Mauthe A, Hutchison D (2014) Assessing the impact of intra-cloud live migration on anomaly detection. In: 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet), pages 52–57, https://doi.org/10.1109/CloudNet.2014.6968968
Snir M, Wisniewski R W, Abraham JA, Adve SV, Saurabh B, Pavan B, Jim B, Pradip B, Franck C, Bill C, Chien AA, Paul C, Debardeleben NA, Diniz PC, Christian E, Mattan E, Saverio F, Al G, Rinku G, Fred J, Sriram K, Sven L, Dean L, Subhasish M, Todd M, Rob S, Jon S, Eric Van H (2014) Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28(2):129–173. https://doi.org/10.1177/1094342014522573 (ISSN 1094-3420)
Article Google Scholar
Tavakkol B, Jeong Myong K, Albin Susan L (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151. https://doi.org/10.1016/j.neucom.2016.12.007 (ISSN 0925-2312)
Article Google Scholar
Tuncer O, Ates EC, Zhang Y, Turk A, Brandt JM, Leung VJ, Egele M, Coskun AK (2017) Diagnosing performance variations in hpc applications using machine learning. In: ISC. https://doi.org/10.1007/978-3-319-58667-0_19
Xiao X, Sun J, Yang J (2021) Operation and maintenance(o &m) for data center: an intelligent anomaly detection approach. Computer Communications, 178:141–152. ISSN 0140-3664. https://doi.org/10.1016/j.comcom.2021.06.030. https://www.sciencedirect.com/science/article/pii/S0140366421002541. Accessed 2 June 2022
Zhang X, Meng F, Chen P, Xu J (2016) Taskinsight: A fine-grained performance anomaly detection and problem locating system. In: 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pages 917–920. https://doi.org/10.1109/CLOUD.2016.0136

Download references

Acknowledgements

The experimental work was developed at the HPC-MARWAN computing cluster of the Mohammed V University In Rabat, Morocco.

Funding

The authors of this paper have not received any financial support for research, authorship and/or publication of this article.

Author information

Badr Hirchoua and Ali Yahyaouy have contributed equally to this work.

Authors and Affiliations

Computer Science, University Sidi Mohammed Ben Abdellah-USMBA, Fez, Morocco
Saloua El Motaki & Ali Yahyaouy
Computer Science, National Higher School of Arts and Crafts (ENSAM), Hassan II University, Casablanca, Morocco
Badr Hirchoua

Authors

Saloua El Motaki
View author publications
You can also search for this author in PubMed Google Scholar
Badr Hirchoua
View author publications
You can also search for this author in PubMed Google Scholar
Ali Yahyaouy
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: SEM; Formal analysis and implementation: SEM; Writing—original draft preparation: SEM; Writing—review and editing: SEM and BH; Supervision: AY.

Corresponding author

Correspondence to Saloua El Motaki.

Ethics declarations

Conflict of interest

The authors of this paper declare that they have no significant competing financial, professional, or personal interests that might have influenced the performance or presentation of this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: p-Friedman test

The Friedman test is an extension of the Wilcoxon Signed Rank Test and the non-parametric equivalent of the 1-factor analysis of variance with repeated measures (Hubert and Arabie 1985). The Friedman test assumes the null hypothesis that k dependent variables belong to the same population. For the position parameter of a sample i by $M_{i}$, we denote the null hypothesis by $H_{0}$ and the alternative hypothesis by $H_{a}$ by the following:

$$\begin{aligned}&H_{0} : M_{1}= M_{2}=\ldots = M_{k} \\&H_{a} : M_{i} \ne M_{j} \text { for at least one } \left( i,j \right) \\&\qquad \text {pair of the Friedman test.} \end{aligned}$$

Given the Friedman null hypothesis, the expected summed ranks of each group are equal to $\frac{n(k + 1)}{2}$. The Friedman test statistic is expressed as follows:

$$\begin{aligned} Q=\frac{12}{nk(k+1)}\sum _{j=1}^{k}\left\{ R_{j} -\frac{n(k+1)}{2} \right\} ^{2} \end{aligned}$$

(15)

where $R_{i}$ is the sum of the ranks for the sample i.

Rights and permissions

Reprints and permissions

About this article

Cite this article

El Motaki, S., Hirchoua, B. & Yahyaouy, A. A new fuzzy MLE-clustering approach based on object-to-group probabilistic distance measure: from anomaly detection to multi-fault classification in datacenter computational nodes. J Ambient Intell Human Comput 14, 12697–12708 (2023). https://doi.org/10.1007/s12652-022-04205-0

Download citation

Received: 04 January 2022
Accepted: 15 June 2022
Published: 02 July 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s12652-022-04205-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new fuzzy MLE-clustering approach based on object-to-group probabilistic distance measure: from anomaly detection to multi-fault classification in datacenter computational nodes

Abstract

Access this article

Similar content being viewed by others

T2RFIS: type-2 regression-based fuzzy inference system

A comprehensive survey of anomaly detection techniques for high dimensional big data

The big data system, components, tools, and technologies: a survey

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: p-Friedman test

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new fuzzy MLE-clustering approach based on object-to-group probabilistic distance measure: from anomaly detection to multi-fault classification in datacenter computational nodes

Abstract

Access this article

Similar content being viewed by others

T2RFIS: type-2 regression-based fuzzy inference system

A comprehensive survey of anomaly detection techniques for high dimensional big data

The big data system, components, tools, and technologies: a survey

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: p-Friedman test

Appendix: p-Friedman test

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation