Skip to main content
Log in

Connecting the dots: anomaly and discontinuity detection in large-scale systems

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Cloud providers and data centers rely heavily on forecasts to accurately predict future workload. This information helps them in appropriate virtualization and cost-effective provisioning of the infrastructure. The accuracy of a forecast greatly depends upon the merit of performance data fed to the underlying algorithms. One of the fundamental problems faced by analysts in preparing data for use in forecasting is the timely identification of data discontinuities. A discontinuity is an abrupt change in a time-series pattern of a performance counter that persists but does not recur. Analysts need to identify discontinuities in performance data so that they can (a) remove the discontinuities from the data before building a forecast model and (b) retrain an existing forecast model on the performance data from the point in time where a discontinuity occurred. There exist several approaches and tools to help analysts identify anomalies in performance data. However, there exists no automated approach to assist data center operators in detecting discontinuities. In this paper, we present and evaluate our proposed approach to help data center analysts and cloud providers automatically detect discontinuities. A case study on the performance data obtained from a large cloud provider and performance tests conducted using an open source benchmark system show that our proposed approach provides on average precision of 84 % and recall 88 %. The approach does not require any domain knowledge to operate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ahn D, Vetter J (2002) Scalable analysis techniques for microprocessor performance counter metrics. In: Proceedings of Supercomputing

  • Attariyan M, Chow M, Flinn J (2012) X-ray: automating root-cause diagnosis of performance anomalies in production software. In: Proceedings of the OSDI, pp 307–320

  • Bondi A (2007) Automating the analysis of load test results to assess the scalability and stability of a component. In: Proceedings of the CMG-CONFERENCE, pp 133

  • Cherkasova L, Ozonat K, Mi N, Symons J, Smirni E (2009) Automated anomaly detection and performance modeling of enterprise applications. ACM Trans Comput Syst (TOCS) 27(3):32

    Article  Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavioral sciences. 2nd edition, Routledge Academic

  • Creţu-Ciocârlie GF, Budiu M, Goldszmidt M (2008) Hunting for problems with Artemis. In: Anonymous proceedings of the First USENIX conference on Analysis of system logs. USENIX Association, pp 2–2

  • Dasu T, Johnson T (2003) Exploratory data mining and data cleaning. J Stat Soft. doi:10.1002/0471448354.ch4

  • Davis I, Hemmati H, Holt R, Godfrey M, Neuse D, Mankovskii S (2012) An empirical investigation of an adaptive utilization prediction algorithm. IBM, Centre for advance studies conference (CASCON)

  • Davis I, Hemmati H, Holt RC, Godfrey MW, Neuse D, Mankovskii S (2013) Storm prediction in a cloud. In: Proceedings of the Principles of Engineering Service-Oriented Systems (PESOS), pp 37–40

  • Delimitrou C, Kozyrakis C (2013) iBench: quantifying interference for datacenter applications, In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pp 23–33

  • Foo KCD (2011) Automated discovery of performance regressions in enterprise applications. Canadian theses

  • Foo KC, Jiang ZM, Adams B, Hassan AE, Zou Y, Flora P (45-2010) Mining performance regression testing repositories for automated performance analysis. In: Proceedings of 10th IEEE International Conference on Quality Software. pp 32–41

  • Foong A, Fung J, Newell D (2004) An in-depth analysis of the impact of processor affinity on network performance. In: Proceedings of the 12th IEEE International Conference on Networks, pp 244–250

  • Georges A, Buytaert D, Eeckhout L (2007) Statistically rigorous java performance evaluation. ACM SIGPLAN Notices 42:57–76

    Article  Google Scholar 

  • Gunasekaran R, Dillow DA, Shipman GM, Maxwell DE, Hill JJ, Park BH, Geist A (2010) Correlating log messages for system diagnostics. In: Proceedings of the Cray Users Group Conference

  • Gunther HW (2000) Websphere application server development best practices for performance and scalability. IBM WebSphere Application Server Standard and Advanced Editions-White paper

  • Hartung J, Knapp G, Sinha BK (2011) Statistical meta-analysis with applications, vol. 738. John Wiley & Sons

  • Jaffe D, Muirhead T (2005) The open source DVD store application.  http://linux.dell.com/dvdstore/

  • Jiang ZM (2010) Automated analysis of load testing results. In: Proceedings of the 19th international symposium on Software testing and analysis, pp 143–146, 42

  • Jiang ZM, Hassan AE, Hamann G, Flora P (2008) An automated approach for abstracting execution logs to execution events. 43(20):249–267

  • Jolliffe I (2002) Principal component analysis, Springer

  • Kampenes VB, Dybå T, Hannay JE, Sjøberg DI (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49:1073–1086

    Article  Google Scholar 

  • Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng: 721–734

  • Knop M, Schopf J, Dinda P (2002) Windows performance monitoring and data reduction using watchtower. In: Proceedings of 11th IEEE Symposium on High-Performance Distributed Computing (HPDC11)

  • Langin C, Rahimi S (2010) Soft computing in intrusion detection: the state of the art. J Ambient Intell Humaniz Comput (JAIHC) 1(2):133–145

    Article  Google Scholar 

  • Leyda M, Geiss R (2010) WinThrottle, [TOOL]

  • Limbrunner JF, Vogel RM, Brown LC (2000) Estimation of harmonic mean of a lognormal variable. J Hydrol Eng 5:59–66

    Article  Google Scholar 

  • Malik H, Adams B, Hassan AE (2010a) Pinpointing the subsystems responsible for the performance deviations in a load test. In: Proceedings of IEEE 21st International Symposium on, San Jose, CA, USA

  • Malik H, Jiang ZM, Adams B, Hassan AE, Flora P, Hamann G (2010b) Automatic comparison of load tests to support the performance analysis of large enterprise systems. In: Proceedings of Software Maintenance and Reengineering (CSMR), pp 222–231

  • Malik H, Jiang ZM, Adams B, Hassan AE, Flora P, Hamann G (2010c) Automatic comparison of load tests to support the performance analysis of large enterprise systems In: Proceedings of 14th European Conference on Software Maintenance and Reengineering, pp 222–231

  • Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 35th International Conference on Software Engineering (ICSE), pp 1012–1021

  • McCaffrey J (2011) “Eat-Mem” [BOOK + Tool]

  • Meng B, Jian X (2016) Anomaly detection model of user behavior based on principal component analysis. J Ambient Intell Humaniz Comput (JAIHC). doi:10.1007/s12652-015-0341-4

    Google Scholar 

  • Nguyen TH, Adams B, Jiang ZM, Hassan AE, Nasser M, Flora P (2011) Automated verification of load tests using control charts. In: proceedings of the 18th Asia Pacific Software Engineering Conference (APSEC), pp 282–289

  • Nguyen TH, Nagappan M, Hassan AE, Nasser M, Flora P (2014) An industrial case study of automatically identifying performance regression-causes. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 232–241

  • Pertet S, Narasimhan P (2012) Causes of failure in web applications. Parallel Data Laboratory, Carnegie Mellon University, CMU-PDL-05-109

  • Rigatos G, Siano P (2013) An approach to fault diagnosis of nonlinear systems using neural networks with invariance to Fourier transform. J Ambient Intell Humaniz Comput (JAIHC) 4(6):621–639

    Article  Google Scholar 

  • Stanford S (2003) MMB3 Comparative analysis–White Paper

  • Syer MD, Adams B, Hassan AE (2011) Identifying performance deviations in thread pools. In: Proceedings of the 27th IEEE International Conference on Software Maintenance (ICSM), pp 83–92

  • Syer MD, Jiang ZM, Nagappan M, Hassan AE, Nasser M, Flora P (2013) Leveraging performance counters and execution logs to diagnose memory-related performance issues. In: Proceedings of the 7th international workshop on Software and performance, pp 55–66

  • Thakkar D, Hassan AE, Hamann G, Flora P (2008) A framework for measurement based performance modeling. In: Anonymous WOSP ‘08: Proceedings of the 7th international workshop on Software and performance, Princeton, NJ, USA. ACM, New York, NY, USA, pp 55–66

  • Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bull 1(6):80–83. doi:10.2307/3001968

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to C.A. Technologies Inc., for supporting and funding this research, and for providing access to the production data used in our case study. The findings and opinions expressed in this paper are those of the authors and do not necessarily represent or reflect those of C.A Technologies and/or its subsidiaries and affiliates. This work was funded in part by a Collaborative Research and Development grant from the National Science and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haroon Malik.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malik, H., Davis, I.J., Godfrey, M.W. et al. Connecting the dots: anomaly and discontinuity detection in large-scale systems. J Ambient Intell Human Comput 7, 509–522 (2016). https://doi.org/10.1007/s12652-016-0381-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-016-0381-4

Keywords

Navigation