skip to main content
10.1145/3034950.3034967acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmssConference Proceedingsconference-collections
research-article

Utility Cloud: A Novel Approach for Diagnosis and Self-healing Based on the Uncertainty in Anomalous Metrics

Authors Info & Claims
Published:14 January 2017Publication History

ABSTRACT

Diagnosis and self-healing anomalies are an important as pectin the operations for data centers and for computing in utility clouds. Data centers which offer greater size along with heterogeneous and complex applications, software and workload systems such as anomaly/fault detection require automatic operation with runtime self-healing. Moreover, they must do so without requirement of having previous knowledge regarding normal and/or anomalous performance. Diagnosis must function for levels of different abstraction such as hardware and software, along with multiple time-series metrics for the use in environments like cloud computing. By classifying and analyzing metrics that are qualitative can offer new methods for anomaly and fault diagnosis for their distribution instead of simply providing individual metric thresholds. Categorical counterparts (binning) and normal distribution are used as a measurement that captures the extent of the distributions for concentration or dispersion so that raw metric data can be gathered across the cloud stack to create a time-series for entropy (e.g. symptoms).These symptoms can be organized hierarchically and across many cloud-ecosystems to increase scalability. These symptoms can also address the uncertainty in anomaly data. This contribution is possible since the multi-value decision diagram (MDD) for the structure of Multiple-Valued Logic (MVL) is shown to be highly efficient in terms of rapid analytical performance and is adapted to integrate the effects of measurements for multiple values. Naive Bayes Classifier (NBC) with an influence diagram (ID) are used to create time-series diagnosis technique to categorize and detect anomalies/faults during runtime to approximate the influence of each self-healing system component as to systems functioning and reliability. For the utility cloud service scenarios, the results of experiment show the viability of the presented approach. With an average of 0.89% improvement in the accuracy of anomaly diagnosis on threshold methods and an average improvement of 0.04% for false alarm rates with the near-optimum threshold method the presented approach outperforms other methods.

References

  1. Ward, Jonathan Stuart, and Adam Barker. 2015. Cloud cover: monitoring large-scale clouds with Varanus. Journal of Cloud Computing 4.1: 1--28.Google ScholarGoogle Scholar
  2. Zaitseva, E. 2012. Importance Measures in Reliability Analysis of Healthcare System. Human--Computer Systems Interaction: Backgrounds and Applications 2. Springer Berlin Heidelberg. 119--133. DOI=http://link.springer.com/chapter/10.1007%2F978-3-642-23187-2_8Google ScholarGoogle Scholar
  3. Stankovic, Radomir. 2015. Application of Multi-valued Decision Diagrams in Computing the Direct Partial Logic Derivatives. Computer Aided Systems Theory--EUROCAST 2015: 15th International Conference, Las Palmas de Gran Canaria, Spain, February 8-13, Revised Selected Papers. Vol. 9520. Springer, 2016. DOI=http://link.springer.com/chapter/10.1007%2F978-3-319-27340-2_6Google ScholarGoogle Scholar
  4. Trivedi, K., et al. 2008. Availability modeling of SIP protocol on IBM© Websphere©. In proceedings of the 14th IEEE Pacific Rim International onDependable Computing, 2008. PRDC'08. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bernard, H. Russell. 2011. Research methods in anthropology: Qualitative and quantitative approaches. Rowman Altamira.Google ScholarGoogle Scholar
  6. Zhai, Qingqing, et al. 2015. Multi-Valued Decision Diagram-Based Reliability Analysis of-out-of-Cold Standby Systems Subject to Scheduled Backups.IEEE Transactions on Reliability,64.4: 1310--1324.Google ScholarGoogle Scholar
  7. Zaitseva, Elena, and VitalyLevashenko. 2013. Multiple-Valued Logic mathematical approaches for multi-state system reliability analysis. Journal of Applied Logic11.3: 350--362Google ScholarGoogle ScholarCross RefCross Ref
  8. Zaitseva, Elena. 2012. Importance analysis of a multi-state system based on multiple-valued logic methods. Recent Advances in System Reliability. Springer London. 113--134.Google ScholarGoogle Scholar
  9. Watthayu, Wiboonsak., and Yun Peng. 2004, August. A Bayesian network based framework for multi-criteria decision making. In Proceedings of the 17th International Conference on Multiple Criteria Decision Analysis (pp. 6--11).Google ScholarGoogle Scholar
  10. Dai, Yuanshun, Yanping Xiang, and Gewei Zhang. 2009. Self-healing and hybrid diagnosis in cloud computing.Cloud computing. Springer Berlin Heidelberg. 45--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sharma, Bikash, et al. 2013. CloudPD: Problem determination and diagnosis in shared dynamic clouds.2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE.DOI= http://ieeexplore.ieee.org/document/6575298/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Zio, E. 2009: Reliability engineering: old problems and new challenges. Reliab. Eng. SystSaf. 94(2), 125--141.Google ScholarGoogle ScholarCross RefCross Ref
  13. Xing, Liudong, and Yuanshun Dai. 2009. A new decision-diagram-based method for efficient analysis on multistate systems. IEEE Transactions onDependable and Secure Computing, 6.3: 161--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Anatoly Lisnianski, gregoryLeyitin. 2003. Multi-State System Reliability: Assessment, Optimization and Applications. World Scientific Pub Co Inc.Google ScholarGoogle Scholar
  15. Alkasem A, Liu H, Decheng Z, Zhao Y. 2015. AFDI: A Virtualization-based Accelerated Fault Diagnosis Innovation for High Availability Computing.arXiv preprint arXiv:1507.08036. 2015 Jul 29.Google ScholarGoogle Scholar
  16. Smith, W. Earl, et al. 2008. Availability analysis of blade server systems.IBM Systems Journal 47.4: 621--640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bahl, Paramvir, et al. 2007. Towards highly reliable enterprise network services via inference of multi-level dependencies. ACM SIGCOMM Computer Communication Review. Vol. 37. No. 4. ACM. DOI=http://dl.acm.org/citation.cfm?doid=1282380.1282383 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Huang, Yingping, Yusha Wang, and Renjie Zhang. 2014. Fault troubleshooting using bayesian network and multicriteria decision analysis. Advances in Mechanical Engineering.Google ScholarGoogle Scholar
  19. Dai, Yuanshun, et al. 2011.Consequence oriented self-healing and autonomous diagnosis for highly reliable systems and software.IEEE Transactions onReliability,60.2: 369--380.Google ScholarGoogle Scholar
  20. Brooks, Rodney A. 1991. "Intelligence without representation." Artificial intelligence 47.1: 139--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zaitseva, Elena. 2012. "Importance analysis of a multi-state system based on multiple-valued logic methods.Recent Advances in System Reliability. Springer London. 113--134.Google ScholarGoogle Scholar
  22. Kuo, Way, and Xiaoyan Zhu. 2012. Importance measures in reliability, risk, and optimization: principles and applications. John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. "OpenStack Cloud Software. 2014." http://www.openstack.org/, 2010. Last access in November 2014.Google ScholarGoogle Scholar
  24. http://msemac.redwoods.edu/~darnold/math15/UsingRInStatistics/StandardNormal.phpGoogle ScholarGoogle Scholar
  25. Kumar, Vibhore, et al. 2008. A state-space approach to sla based management.NOMS 2008-2008 IEEE Network Operations and Management Symposium. IEEE.Google ScholarGoogle Scholar
  26. Kumar, Vibhore, et al. 2007. Manage: policy-driven self-management for enterprise-scale systems. In Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware. Springer-Verlag New York, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hadoop website. http://hadoop.apache.org/.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICMSS '17: Proceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences
    January 2017
    339 pages
    ISBN:9781450348348
    DOI:10.1145/3034950

    Copyright © 2017 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 January 2017

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader