research-article

Utility Cloud: A Novel Approach for Diagnosis and Self-healing Based on the Uncertainty in Anomalous Metrics

Authors:
Ameen Alkasem

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
View Profile

,
Hongwei Liu

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
View Profile

,
Decheng Zuo

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
View Profile

ICMSS '17: Proceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service SciencesJanuary 2017Pages 99–107https://doi.org/10.1145/3034950.3034967

Published:14 January 2017Publication History

ICMSS '17: Proceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences

Pages 99–107

ABSTRACT

Diagnosis and self-healing anomalies are an important as pectin the operations for data centers and for computing in utility clouds. Data centers which offer greater size along with heterogeneous and complex applications, software and workload systems such as anomaly/fault detection require automatic operation with runtime self-healing. Moreover, they must do so without requirement of having previous knowledge regarding normal and/or anomalous performance. Diagnosis must function for levels of different abstraction such as hardware and software, along with multiple time-series metrics for the use in environments like cloud computing. By classifying and analyzing metrics that are qualitative can offer new methods for anomaly and fault diagnosis for their distribution instead of simply providing individual metric thresholds. Categorical counterparts (binning) and normal distribution are used as a measurement that captures the extent of the distributions for concentration or dispersion so that raw metric data can be gathered across the cloud stack to create a time-series for entropy (e.g. symptoms).These symptoms can be organized hierarchically and across many cloud-ecosystems to increase scalability. These symptoms can also address the uncertainty in anomaly data. This contribution is possible since the multi-value decision diagram (MDD) for the structure of Multiple-Valued Logic (MVL) is shown to be highly efficient in terms of rapid analytical performance and is adapted to integrate the effects of measurements for multiple values. Naive Bayes Classifier (NBC) with an influence diagram (ID) are used to create time-series diagnosis technique to categorize and detect anomalies/faults during runtime to approximate the influence of each self-healing system component as to systems functioning and reliability. For the utility cloud service scenarios, the results of experiment show the viability of the presented approach. With an average of 0.89% improvement in the accuracy of anomaly diagnosis on threshold methods and an average improvement of 0.04% for false alarm rates with the near-optimum threshold method the presented approach outperforms other methods.

References

Ward, Jonathan Stuart, and Adam Barker. 2015. Cloud cover: monitoring large-scale clouds with Varanus. Journal of Cloud Computing 4.1: 1--28.Google Scholar
Zaitseva, E. 2012. Importance Measures in Reliability Analysis of Healthcare System. Human--Computer Systems Interaction: Backgrounds and Applications 2. Springer Berlin Heidelberg. 119--133. DOI=http://link.springer.com/chapter/10.1007%2F978-3-642-23187-2_8Google Scholar
Stankovic, Radomir. 2015. Application of Multi-valued Decision Diagrams in Computing the Direct Partial Logic Derivatives. Computer Aided Systems Theory--EUROCAST 2015: 15th International Conference, Las Palmas de Gran Canaria, Spain, February 8-13, Revised Selected Papers. Vol. 9520. Springer, 2016. DOI=http://link.springer.com/chapter/10.1007%2F978-3-319-27340-2_6Google Scholar
Trivedi, K., et al. 2008. Availability modeling of SIP protocol on IBM© Websphere©. In proceedings of the 14th IEEE Pacific Rim International onDependable Computing, 2008. PRDC'08. IEEE. Google ScholarDigital Library
Bernard, H. Russell. 2011. Research methods in anthropology: Qualitative and quantitative approaches. Rowman Altamira.Google Scholar
Zhai, Qingqing, et al. 2015. Multi-Valued Decision Diagram-Based Reliability Analysis of-out-of-Cold Standby Systems Subject to Scheduled Backups.IEEE Transactions on Reliability,64.4: 1310--1324.Google Scholar
Zaitseva, Elena, and VitalyLevashenko. 2013. Multiple-Valued Logic mathematical approaches for multi-state system reliability analysis. Journal of Applied Logic11.3: 350--362Google ScholarCross Ref
Zaitseva, Elena. 2012. Importance analysis of a multi-state system based on multiple-valued logic methods. Recent Advances in System Reliability. Springer London. 113--134.Google Scholar
Watthayu, Wiboonsak., and Yun Peng. 2004, August. A Bayesian network based framework for multi-criteria decision making. In Proceedings of the 17th International Conference on Multiple Criteria Decision Analysis (pp. 6--11).Google Scholar
Dai, Yuanshun, Yanping Xiang, and Gewei Zhang. 2009. Self-healing and hybrid diagnosis in cloud computing.Cloud computing. Springer Berlin Heidelberg. 45--56. Google ScholarDigital Library
Sharma, Bikash, et al. 2013. CloudPD: Problem determination and diagnosis in shared dynamic clouds.2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE.DOI= http://ieeexplore.ieee.org/document/6575298/ Google ScholarDigital Library
Zio, E. 2009: Reliability engineering: old problems and new challenges. Reliab. Eng. SystSaf. 94(2), 125--141.Google ScholarCross Ref
Xing, Liudong, and Yuanshun Dai. 2009. A new decision-diagram-based method for efficient analysis on multistate systems. IEEE Transactions onDependable and Secure Computing, 6.3: 161--174. Google ScholarDigital Library
Anatoly Lisnianski, gregoryLeyitin. 2003. Multi-State System Reliability: Assessment, Optimization and Applications. World Scientific Pub Co Inc.Google Scholar
Alkasem A, Liu H, Decheng Z, Zhao Y. 2015. AFDI: A Virtualization-based Accelerated Fault Diagnosis Innovation for High Availability Computing.arXiv preprint arXiv:1507.08036. 2015 Jul 29.Google Scholar
Smith, W. Earl, et al. 2008. Availability analysis of blade server systems.IBM Systems Journal 47.4: 621--640. Google ScholarDigital Library
Bahl, Paramvir, et al. 2007. Towards highly reliable enterprise network services via inference of multi-level dependencies. ACM SIGCOMM Computer Communication Review. Vol. 37. No. 4. ACM. DOI=http://dl.acm.org/citation.cfm?doid=1282380.1282383 Google ScholarDigital Library
Huang, Yingping, Yusha Wang, and Renjie Zhang. 2014. Fault troubleshooting using bayesian network and multicriteria decision analysis. Advances in Mechanical Engineering.Google Scholar
Dai, Yuanshun, et al. 2011.Consequence oriented self-healing and autonomous diagnosis for highly reliable systems and software.IEEE Transactions onReliability,60.2: 369--380.Google Scholar
Brooks, Rodney A. 1991. "Intelligence without representation." Artificial intelligence 47.1: 139--159. Google ScholarDigital Library
Zaitseva, Elena. 2012. "Importance analysis of a multi-state system based on multiple-valued logic methods.Recent Advances in System Reliability. Springer London. 113--134.Google Scholar
Kuo, Way, and Xiaoyan Zhu. 2012. Importance measures in reliability, risk, and optimization: principles and applications. John Wiley & Sons. Google ScholarDigital Library
"OpenStack Cloud Software. 2014." http://www.openstack.org/, 2010. Last access in November 2014.Google Scholar
http://msemac.redwoods.edu/~darnold/math15/UsingRInStatistics/StandardNormal.phpGoogle Scholar
Kumar, Vibhore, et al. 2008. A state-space approach to sla based management.NOMS 2008-2008 IEEE Network Operations and Management Symposium. IEEE.Google Scholar
Kumar, Vibhore, et al. 2007. Manage: policy-driven self-management for enterprise-scale systems. In Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware. Springer-Verlag New York, Inc. Google ScholarDigital Library
Hadoop website. http://hadoop.apache.org/.Google Scholar

Recommendations

A Multi-agent System Architecture for Self-Healing Cloud Infrastructure
ICC '16: Proceedings of the International Conference on Internet of things and Cloud Computing

The popularity of Cloud computing has considerably increased during the last years. The increase of Cloud users and their interactions with the Cloud infrastructure raise the risk of resources faults. Such a problem can lead to a bad reputation of the ...
Read More
Experimenting on virtual machines co-residency in the cloud: a comparative study of available test beds
SAC '14: Proceedings of the 29th Annual ACM Symposium on Applied Computing

Cloud computing aims to provide users with instant, on-demand access to large pools of computational resources. Although many organizations and end-users have migrated services and data to the cloud, a number of security concerns still exist. In ...
Read More
Self-healing and Hybrid Diagnosis in Cloud Computing
CloudCom '09: Proceedings of the 1st International Conference on Cloud Computing

Cloud computing requires a robust, scalable, and high-performance infrastructure. To provide a reliable and dependable cloud computing platform, it is necessary to build a self-diagnosis and self-healing system against various failures or downgrades. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMSS '17: Proceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences
January 2017
339 pages
ISBN:9781450348348
DOI:10.1145/3034950
Editor:
Yulin Wang
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 January 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Fault Diagnosis
Multi-State System
Multi-valued Decision Diagram
Self-healing
VMs
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 72
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Utility Cloud: A Novel Approach for Diagnosis and Self-healing Based on the Uncertainty in Anomalous Metrics

ICMSS '17: Proceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences

ABSTRACT

References

Cited By

Recommendations

A Multi-agent System Architecture for Self-Healing Cloud Infrastructure

Experimenting on virtual machines co-residency in the cloud: a comparative study of available test beds

Self-healing and Hybrid Diagnosis in Cloud Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Utility Cloud: A Novel Approach for Diagnosis and Self-healing Based on the Uncertainty in Anomalous Metrics

ICMSS '17: Proceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences

ABSTRACT

References

Cited By

Recommendations

A Multi-agent System Architecture for Self-Healing Cloud Infrastructure

Experimenting on virtual machines co-residency in the cloud: a comparative study of available test beds

Self-healing and Hybrid Diagnosis in Cloud Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media