ABSTRACT
Diagnosis and self-healing anomalies are an important as pectin the operations for data centers and for computing in utility clouds. Data centers which offer greater size along with heterogeneous and complex applications, software and workload systems such as anomaly/fault detection require automatic operation with runtime self-healing. Moreover, they must do so without requirement of having previous knowledge regarding normal and/or anomalous performance. Diagnosis must function for levels of different abstraction such as hardware and software, along with multiple time-series metrics for the use in environments like cloud computing. By classifying and analyzing metrics that are qualitative can offer new methods for anomaly and fault diagnosis for their distribution instead of simply providing individual metric thresholds. Categorical counterparts (binning) and normal distribution are used as a measurement that captures the extent of the distributions for concentration or dispersion so that raw metric data can be gathered across the cloud stack to create a time-series for entropy (e.g. symptoms).These symptoms can be organized hierarchically and across many cloud-ecosystems to increase scalability. These symptoms can also address the uncertainty in anomaly data. This contribution is possible since the multi-value decision diagram (MDD) for the structure of Multiple-Valued Logic (MVL) is shown to be highly efficient in terms of rapid analytical performance and is adapted to integrate the effects of measurements for multiple values. Naive Bayes Classifier (NBC) with an influence diagram (ID) are used to create time-series diagnosis technique to categorize and detect anomalies/faults during runtime to approximate the influence of each self-healing system component as to systems functioning and reliability. For the utility cloud service scenarios, the results of experiment show the viability of the presented approach. With an average of 0.89% improvement in the accuracy of anomaly diagnosis on threshold methods and an average improvement of 0.04% for false alarm rates with the near-optimum threshold method the presented approach outperforms other methods.
- Ward, Jonathan Stuart, and Adam Barker. 2015. Cloud cover: monitoring large-scale clouds with Varanus. Journal of Cloud Computing 4.1: 1--28.Google Scholar
- Zaitseva, E. 2012. Importance Measures in Reliability Analysis of Healthcare System. Human--Computer Systems Interaction: Backgrounds and Applications 2. Springer Berlin Heidelberg. 119--133. DOI=http://link.springer.com/chapter/10.1007%2F978-3-642-23187-2_8Google Scholar
- Stankovic, Radomir. 2015. Application of Multi-valued Decision Diagrams in Computing the Direct Partial Logic Derivatives. Computer Aided Systems Theory--EUROCAST 2015: 15th International Conference, Las Palmas de Gran Canaria, Spain, February 8-13, Revised Selected Papers. Vol. 9520. Springer, 2016. DOI=http://link.springer.com/chapter/10.1007%2F978-3-319-27340-2_6Google Scholar
- Trivedi, K., et al. 2008. Availability modeling of SIP protocol on IBM© Websphere©. In proceedings of the 14th IEEE Pacific Rim International onDependable Computing, 2008. PRDC'08. IEEE. Google ScholarDigital Library
- Bernard, H. Russell. 2011. Research methods in anthropology: Qualitative and quantitative approaches. Rowman Altamira.Google Scholar
- Zhai, Qingqing, et al. 2015. Multi-Valued Decision Diagram-Based Reliability Analysis of-out-of-Cold Standby Systems Subject to Scheduled Backups.IEEE Transactions on Reliability,64.4: 1310--1324.Google Scholar
- Zaitseva, Elena, and VitalyLevashenko. 2013. Multiple-Valued Logic mathematical approaches for multi-state system reliability analysis. Journal of Applied Logic11.3: 350--362Google ScholarCross Ref
- Zaitseva, Elena. 2012. Importance analysis of a multi-state system based on multiple-valued logic methods. Recent Advances in System Reliability. Springer London. 113--134.Google Scholar
- Watthayu, Wiboonsak., and Yun Peng. 2004, August. A Bayesian network based framework for multi-criteria decision making. In Proceedings of the 17th International Conference on Multiple Criteria Decision Analysis (pp. 6--11).Google Scholar
- Dai, Yuanshun, Yanping Xiang, and Gewei Zhang. 2009. Self-healing and hybrid diagnosis in cloud computing.Cloud computing. Springer Berlin Heidelberg. 45--56. Google ScholarDigital Library
- Sharma, Bikash, et al. 2013. CloudPD: Problem determination and diagnosis in shared dynamic clouds.2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE.DOI= http://ieeexplore.ieee.org/document/6575298/ Google ScholarDigital Library
- Zio, E. 2009: Reliability engineering: old problems and new challenges. Reliab. Eng. SystSaf. 94(2), 125--141.Google ScholarCross Ref
- Xing, Liudong, and Yuanshun Dai. 2009. A new decision-diagram-based method for efficient analysis on multistate systems. IEEE Transactions onDependable and Secure Computing, 6.3: 161--174. Google ScholarDigital Library
- Anatoly Lisnianski, gregoryLeyitin. 2003. Multi-State System Reliability: Assessment, Optimization and Applications. World Scientific Pub Co Inc.Google Scholar
- Alkasem A, Liu H, Decheng Z, Zhao Y. 2015. AFDI: A Virtualization-based Accelerated Fault Diagnosis Innovation for High Availability Computing.arXiv preprint arXiv:1507.08036. 2015 Jul 29.Google Scholar
- Smith, W. Earl, et al. 2008. Availability analysis of blade server systems.IBM Systems Journal 47.4: 621--640. Google ScholarDigital Library
- Bahl, Paramvir, et al. 2007. Towards highly reliable enterprise network services via inference of multi-level dependencies. ACM SIGCOMM Computer Communication Review. Vol. 37. No. 4. ACM. DOI=http://dl.acm.org/citation.cfm?doid=1282380.1282383 Google ScholarDigital Library
- Huang, Yingping, Yusha Wang, and Renjie Zhang. 2014. Fault troubleshooting using bayesian network and multicriteria decision analysis. Advances in Mechanical Engineering.Google Scholar
- Dai, Yuanshun, et al. 2011.Consequence oriented self-healing and autonomous diagnosis for highly reliable systems and software.IEEE Transactions onReliability,60.2: 369--380.Google Scholar
- Brooks, Rodney A. 1991. "Intelligence without representation." Artificial intelligence 47.1: 139--159. Google ScholarDigital Library
- Zaitseva, Elena. 2012. "Importance analysis of a multi-state system based on multiple-valued logic methods.Recent Advances in System Reliability. Springer London. 113--134.Google Scholar
- Kuo, Way, and Xiaoyan Zhu. 2012. Importance measures in reliability, risk, and optimization: principles and applications. John Wiley & Sons. Google ScholarDigital Library
- "OpenStack Cloud Software. 2014." http://www.openstack.org/, 2010. Last access in November 2014.Google Scholar
- http://msemac.redwoods.edu/~darnold/math15/UsingRInStatistics/StandardNormal.phpGoogle Scholar
- Kumar, Vibhore, et al. 2008. A state-space approach to sla based management.NOMS 2008-2008 IEEE Network Operations and Management Symposium. IEEE.Google Scholar
- Kumar, Vibhore, et al. 2007. Manage: policy-driven self-management for enterprise-scale systems. In Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware. Springer-Verlag New York, Inc. Google ScholarDigital Library
- Hadoop website. http://hadoop.apache.org/.Google Scholar
Recommendations
A Multi-agent System Architecture for Self-Healing Cloud Infrastructure
ICC '16: Proceedings of the International Conference on Internet of things and Cloud ComputingThe popularity of Cloud computing has considerably increased during the last years. The increase of Cloud users and their interactions with the Cloud infrastructure raise the risk of resources faults. Such a problem can lead to a bad reputation of the ...
Experimenting on virtual machines co-residency in the cloud: a comparative study of available test beds
SAC '14: Proceedings of the 29th Annual ACM Symposium on Applied ComputingCloud computing aims to provide users with instant, on-demand access to large pools of computational resources. Although many organizations and end-users have migrated services and data to the cloud, a number of security concerns still exist. In ...
Self-healing and Hybrid Diagnosis in Cloud Computing
CloudCom '09: Proceedings of the 1st International Conference on Cloud ComputingCloud computing requires a robust, scalable, and high-performance infrastructure. To provide a reliable and dependable cloud computing platform, it is necessary to build a self-diagnosis and self-healing system against various failures or downgrades. ...
Comments