skip to main content
10.1145/3030207.3044533acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

Technique for Detecting Early-Warning Signals of Performance Deterioration in Large Scale Software Systems

Published: 17 April 2017 Publication History

Abstract

The detection of early-warning signals of performance deterioration can help technical support teams in taking swift remedial actions, thus ensuring rigor in production support operations of large scale software systems. Performance anomalies or deterioration, if left unattended, often result in system slowness and unavailability. In this paper, we presents a simple, intuitive and low-overhead technique for recognizing the early warning signs in near real time before they impact the system The technique is based on the inverse relationship which exists between throughput and average response time in a closed system. Because of this relationship, a significant increase in the average system response time causes an abrupt fall in system throughput. To identify such occurrences automatically, Individuals and Moving Range (XmR) control charts are used. We also provide a case study from a real-world production system, in which the technique has been successfully used. The use of this technique has reduced the occurrence of performance related incidents significantly in our daily operations. The technique is tool agnostic and can also be easily implemented in popular system monitoring tools by building custom extensions.

References

[1]
Dell DVD Store Database Test Suite. http://linux.dell.com/dvdstore/. Online; accessed 21 February 2017.
[2]
Logging control in w3c httpd. https://www.w3.org/Daemon/User/Config/ Logging.html#common-logfile-format. Online; accessed 25 September 2016.
[3]
Microsoft System Center: Monitors and Rules. https://technet.microsoft.com/en-us/library/hh457603(v=sc.12).aspx. Online; accessed 23 February 2017.
[4]
Microsoft System Center Operations Manager. https://technet.microsoft.com/library/hh205987.aspx. Online; accessed 21 February 2017.
[5]
Nagios - The Industry Standard In IT Infrastructure Monitoring. https://www.nagios.org/. Online; accessed 21 February 2017.
[6]
Oracle Enterprise Manager 12c. http://www.oracle.com/technetwork/oem/enterprise-manager/overview/index.html. Online; accessed 21 February 2017.
[7]
F. M. Bereznay. Did something change? using statistical techniques to interpret service and resource metrics. In Int. CMG Conference, 2006.
[8]
J. P. Buzen and A. W. Shum. Masf - multivariate adaptive statistical filtering. In Int. CMG Conference, pages 1--10. Computer Measurement Group, 1995.
[9]
L. Cherkasova, K. Ozonat, N. Mi, J. Symons, and E. Smirni. Anomaly? application change? or workload change? towards automated detection of application performance anomaly and change. In 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), pages 452--461, June 2008.
[10]
W. A. Florac and A. D. Carleton. Measuring the Software Process: Statistical Process Control for Software Process Improvement. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.
[11]
K. C. Foo, Z. M. Jiang, B. Adams, A. E. Hassan, Y. Zou, and P. Flora. Mining performance regression testing repositories for automated performance analysis. In 2010 10th International Conference on Quality Software, pages 32--41, July 2010.
[12]
M. Harchol-Balter. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, New York, NY, USA, 1st edition, 2013.
[13]
S. Iwata and K. Kono. Narrowing down possible causes of performance anomaly in web applications. In 2010 European Dependable Computing Conference, pages 185--190, April 2010.
[14]
J. D. C. Little. A proof for the queuing formula: L =beginmathλw. Oper. Res., 9(3):383--387, June 1961.
[15]
J. P. Magalhaes and L. M. Silva. Detection of performance anomalies in web-based applications. In 2010 Ninth IEEE International Symposium on Network Computing and Applications, pages 60--67, July 2010.
[16]
H. Malik, B. Adams, and A. E. Hassan. Pinpointing the subsystems responsible for the performance deviations in a load test. In 2010 IEEE 21st International Symposium on Software Reliability Engineering, pages 201--210, Nov 2010.
[17]
H. Malik, B. Adams, A. E. Hassan, P. Flora, and G. Hamann. Using load tests to automatically compare the subsystems of a large enterprise system. In 2010 IEEE 34th Annual Computer Software and Applications Conference, pages 117--126, July 2010.
[18]
H. Malik, H. Hemmati, and A. E. Hassan. Automatic detection of performance deviations in the load testing of large scale systems. In 2013 35th International Conference on Software Engineering (ICSE), pages 1012--1021, May 2013.
[19]
H. Malik, Z. M. Jiang, B. Adams, A. E. Hassan, P. Flora, and G. Hamann. Automatic comparison of load tests to support the performance analysis of large enterprise systems. In 2010 14th European Conference on Software Maintenance and Reengineering, pages 222--231, March 2010.
[20]
R. K. Mansharamani, A. Khanapurkar, B. Mathew, and R. Subramanyan. Performance testing: Far from steady state. In COMPSAC Workshops, pages 341--346. IEEE Computer Society, 2010.
[21]
T. H. Nguyen, B. Adams, Z. M. Jiang, A. E. Hassan, M. Nasser, and P. Flora. Automated detection of performance regressions using statistical process control techniques. In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, ICPE '12, pages 299--310, New York, NY, USA, 2012. ACM.
[22]
B. Schroeder, A. Wierman, and M. Harchol-Balter. Open versus closed: A cautionary tale. In Proceedings of the 3rd Conference on Networked Systems Design & Implementation - Volume 3, NSDI'06, pages 18--18, Berkeley, CA, USA, 2006. USENIX Association.
[23]
L. Tang, T. Li, L. Shwartz, F. Pinel, and G. Y. Grabarnik. An integrated framework for optimizing automatic monitoring systems in large it infrastructures. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '13, pages 1249--1257, New York, NY, USA, 2013. ACM.
[24]
I. Trubin. Review of IT Control Chart. Journal of Emerging Trends in Computing and Information Sciences, 4(11):857--868, Dec. 2013.
[25]
I. Trubin and V. C. Scmg. Capturing workload pathology by statistical exception detection system. In Proceedings of the Computer Measurement Group, 2005.
[26]
I. A. Trubin. Global and Application Level Exception Detection System, Based on MASF Technique. In 28th International Computer Measurement Group Conference, December 8-13, 2002, Reno, Nevada, USA, Proceedings, pages 557--566, 2002.
[27]
I. A. Trubin and L. Merritt. "Mainframe Global and Workload Level Statistical Exception Detection System, Based on MASF". In 30th International Computer Measurement Group Conference,December 5-10, 2004, Las Vegas, Nevada, USA, Proceedings, pages 671--678, 2004.
[28]
T. Wilson. What were they thinking: Modeling think times for performance testing. 2011.

Cited By

View all
  • (2025)Performance regression testing initiativesInformation and Software Technology10.1016/j.infsof.2024.107641179:COnline publication date: 1-Mar-2025
  • (2020)Investigating types and survivability of performance bugs in mobile appsEmpirical Software Engineering10.1007/s10664-019-09795-625:3(1644-1686)Online publication date: 1-May-2020
  • (2018)One Size Does Not Fit AllProceedings of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3184407.3184418(211-222)Online publication date: 30-Mar-2018

Index Terms

  1. Technique for Detecting Early-Warning Signals of Performance Deterioration in Large Scale Software Systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICPE '17: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering
    April 2017
    450 pages
    ISBN:9781450344043
    DOI:10.1145/3030207
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 April 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. closed systems
    2. little's law
    3. performance anomaly
    4. performance deterioration
    5. production systems
    6. system monitoring

    Qualifiers

    • Research-article

    Conference

    ICPE '17
    Sponsor:

    Acceptance Rates

    ICPE '17 Paper Acceptance Rate 27 of 83 submissions, 33%;
    Overall Acceptance Rate 252 of 851 submissions, 30%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Performance regression testing initiativesInformation and Software Technology10.1016/j.infsof.2024.107641179:COnline publication date: 1-Mar-2025
    • (2020)Investigating types and survivability of performance bugs in mobile appsEmpirical Software Engineering10.1007/s10664-019-09795-625:3(1644-1686)Online publication date: 1-May-2020
    • (2018)One Size Does Not Fit AllProceedings of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3184407.3184418(211-222)Online publication date: 30-Mar-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media