Continuous validation of performance test workloads

Syer, Mark D.; Shang, Weiyi; Jiang, Zhen Ming; Hassan, Ahmed E.

doi:10.1007/s10515-016-0196-8

Continuous validation of performance test workloads

Published: 01 April 2016

Volume 24, pages 189–231, (2017)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Mark D. Syer¹,
Weiyi Shang¹,
Zhen Ming Jiang² &
…
Ahmed E. Hassan¹

935 Accesses
Explore all metrics

Abstract

The rise of large-scale software systems poses many new challenges for the software performance engineering field. Failures in these systems are often associated with performance issues, rather than with feature bugs. Therefore, performance testing has become essential to ensuring the problem-free operation of these systems. However, the performance testing process is faced with a major challenge: evolving field workloads, in terms of evolving feature sets and usage patterns, often lead to “outdated” tests that are not reflective of the field. Hence performance analysts must continually validate whether their tests are still reflective of the field. Such validation may be performed by comparing execution logs from the test and the field. However, the size and unstructured nature of execution logs makes such a comparison unfeasible without automated support. In this paper, we propose an automated approach to validate whether a performance test resembles the field workload and, if not, determines how they differ. Performance analysts can then update their tests to eliminate such differences, hence creating more realistic tests. We perform six case studies on two large systems: one open-source system and one enterprise system. Our approach identifies differences between performance tests and the field with a precision of 92 % compared to only 61 % for the state-of-the-practice and 19 % for a conventional statistical comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding Performance Patterns from Logs with High Confidence

Towards an Efficient Performance Testing Through Dynamic Workload Adaptation

Systematic Construction, Execution, and Reproduction of Complex Performance Benchmarks

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Adam K.: Process a million songs with apache pig. http://blog.cloudera.com/blog/2012/08/process-a-million-songs-with-apache-pig/ (2012). Accessed 28 Oct 2015
Ausick, P.: NASDAQ gets off cheap in Facebook IPO SNAFU. http://finance.yahoo.com/news/nasdaq-gets-off-cheap-facebook-174557126.html (2012). Accessed 09 Dec 2014
Avritzer, A., Weyuker, E.J.: Generating test suites for software load testing. In: Proceedings of the International Symposium on Software Testing and Analysis, pp. 44–57 (1994)
Avritzer, A., Weyuker, E.J.: The automatic generation of load test suites and the assessment of the resulting software. Trans. Softw. Eng. 21(9), 705–716 (1995)
Article Google Scholar
Barros, M.D., Shiau, J., Shang, C., Gidewall, K., Shi, H., Forsmann, J.: Web services wind tunnel: on performance testing large-scale stateful web services. In: International Conference on Dependable Systems and Networks, pp. 612–617 (2007)
Bataille, J.: Operational progress report. http://www.hhs.gov/digitalstrategy/blog/2013/12/operational-progress-report.html (2013). Accessed 01 Jun 2014
Benoit, D.: Nasdaqs blow-by-blow on what happened to Facebook. http://blogs.wsj.com/deals/2012/05/21/nasdaqs-blow-by-blow-on-what-happened-to-facebook/ (2013). Accessed 05 May 2014
Bernat, A.R., Miller B.P.: Anywhere, any-time binary instrumentation. In: Proceedings of the Workshop on Program Analysis for Software Tools, pp. 9–16 (2011)
Bertolotti, L., Calzarossa, M.C.: Models of mail server workloads. Perform. Eval. 46(2–3), 65–76 (2001)
Article MATH Google Scholar
Cai, Y., Grundy, J., Hosking, J.: Synthesizing client load models for performance engineering via web crawling. In: Proceedings of the International Conference on Automated Software Engineering, pp. 353–362 (2007)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)
MathSciNet MATH Google Scholar
Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J Math. Models Methods Appl. Sci. 1(4), 300–307 (2007)
MathSciNet Google Scholar
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. Proc. VLDB Endow. 5(12), 1802–1813 (2012)
Article Google Scholar
Cheng, J.: Steve jobs on MobileMe. http://arstechnica.com/apple/2008/08/steve-jobs-on-mobileme-the-full-e-mail/ (2008). Accessed 25 Jan 2014
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Routledge, New York (1988)
MATH Google Scholar
Coleman P.: The avoidable cost of downtime. http://www.ca.com//media/Files/SupportingPieces/acd_report_110110.ashx (2011). Accessed 14 Apr 2014
Cornelissen, B., Zaidman, A., van Deursen, A., Moonen, L., Koschke, R.: A systematic survey of program comprehension through dynamic analysis. Trans. Softw. Eng. 35(5), 684–702 (2009)
Article Google Scholar
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Draheim, D., Grundy, J., Hosking, J., Lutteroth, C., Weber, G.: Realistic load testing of web applications. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 57–68 (2006)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis, 1st edn. Wiley, New York (1973)
MATH Google Scholar
Elliott, A.C.: Statistical Analysis Quick Reference Guidebook, 1st edn. Sage, Thousand Oaks (2006)
Google Scholar
Frades, I., Matthiesen, R.: Overview on techniques in cluster analysis. Bioinform. Methods Clin. Res. 593, 81–107 (2009)
Article Google Scholar
Fulekar, M.H.: Bioinformatics: Applications in Life and Environmental Sciences, 1st edn. Springer, New York (2008)
Google Scholar
Greenwood, D., Lyell, M., Mallya, A., Suguri, H.: The IEEE FIPA approach to integrating software agents and web services. In: Proceedings of the International Joint Conference on Autonomous-Agents and Multiagent Systems, pp. 1412–1418 (2007)
Hadoop: http://hadoop.apache.org/ (2014). Accessed 17 Apr 2013
Hadoop-LZO: https://github.com/twitter/hadoop-lzo (2011). Accessed 28 Oct 2015
Harris, C.: IT downtime costs ${\$}$26.5 billion in lost revenue. http://www.informationweek.com/it-downtime-costs-$265-billion-in-lost-revenue/d/d-id/1097919? (2011). Accessed 25 Jan 2014
Hassan, A.E., Flora, P.: Performance engineering in industry: current practices and adoption challenges. In: Proceedings of the International Workshop on Software and Performance, pp. 209–209 (2007)
Hassan, A.E., Martin, D.J., Flora, P., Mansfield, P., Dietz, D.: An industrial case study of customizing operational profiles using log compression. In: Proceedings of the 30th International Conference on Software Engineering, pp. 713–723 (2008)
Howell Jr., T., Dinan, S.: Price of fixing, upgrading obamacare website rises to \$121 million. http://www.washingtontimes.com/news/2014/apr/29/obamacare-website-fix-will-cost-feds-121-million/ (2014). Accessed 09 Dec 2014
Huang, A.: Similarity measures for text document clustering. In: Proceedings of the New Zealand Computer Science Research Student Conference, pp. 44–56 (2008)
Jiang Z.M.: Automated analysis of load testing results. PhD thesis, Queen’s University (2013)
Jiang, Z.M., Hassan, A.E., Hamann, G., Flora, P.: An automated approach for abstracting execution logs to execution events. J. Softw. Maint. Evol. 20(4), 249–267 (2008a)
Article Google Scholar
Jiang, Z.M., Hassan, A.E., Hamann, G., Flora, P.: Automatic identification of load testing problems. In: Proceedings of the International Conference on Software Maintenance, pp. 307–316 (2008b)
Jiang, Z.M., Hassan, A.E., Hamann, G., Flora, P.: Automated performance analysis of load tests. In: Proceedings of the International Conference on Software Maintenance, pp. 125–134 (2009)
Kampenes, V.B., Dybå, T., Hannay, J.E., Sjøberg, D.I.K.: A systematic review of effect size in software engineering experiments. Inform. Softw. Technol. 49(11–12), 1073–1086 (2007)
Article Google Scholar
Kavulya, S., Tan, J., Gandhi, R., Narasimhan, P.: An analysis of traces from a production mapreduce cluster. In: Proceedings of the International Conference on Cluster, Cloud and Grid Computing, pp. 94–103 (2010)
Klose, O.: Hadoop on Linux on Azure. http://blogs.technet.com/b/oliviaklose/archive/2014/06/17/hadoop-on-linux-on-azure-1.aspx (2014). Accessed 28 Oct 2015
Kremenek, T., Engler, D.: Z-ranking: using statistical analysis to counter the impact of static analysis approximations. In: Proceedings of the International Conference on Static Analysis, pp. 295–315 (2003)
Krishnamurthy, D., Rolia, J.A., Majumdar, S.: A synthetic workload generation technique for stress testing session-based systems. Trans. Softw. Eng. 32(11), 868–882 (2006)
Article Google Scholar
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)
Article MATH Google Scholar
Laurenzano, M.A., Peraza, J., Carrington, L., Tiwari Jr., A., Ward, W., Campbell, R.: Pebil: binary instrumentation for practical data-intensive program analysis. Clust. Comput. 1(18), 1–14 (2015)
Article Google Scholar
MapReduce Tutorial: http://hadoop.apache.org/docs/stable/mapred_tutorial.html (2014). Accessed 16 Jun 2014
Meira, J.A., de Almeida, E.C., Traon, Y.L., Sunye, G.: Peer-to-peer load testing. In: Proceedings of the International Conference on Software Testing, Verification and Validation, pp. 642–647 (2012)
Menascé, D.A.: Load testing of web sites. IEEE Internet Comput. 6(4), 70–74 (2002)
Article Google Scholar
Metrics 20: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html (2014). Accessed 16 Jun 2014
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)
Article Google Scholar
Million Song Dataset: https://aws.amazon.com/datasets/million-song-dataset/ (2011). Accessed 28 Oct 2015
Million Song Dataset: http://labrosa.ee.columbia.edu/millionsong/ (2012). Accessed 28 Oct 2015
Mojena, R.: Hierarchical grouping methods and stopping rules: an evaluation. Comput. J. 20(4), 353–363 (1977)
Article MATH Google Scholar
Nagappan, M., Wu, K., Vouk M.A.: Efficiently extracting operational profiles from execution logs using suffix arrays. In: Proceedings of the International Symposium on Software Reliability Engineering, pp. 41–50 (2009)
OutputCommitter: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/OutputCommitter.html (2014). Accessed 16 Jun 2014
Parnas, D.L.: Software aging. In: Proceedings of the International Conference on Software Engineering, pp. 279–287 (1994)
PerfMon: http://perfmon.sourceforge.net/ (2014). Accessed 26 Jan 2014
RecordReader: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RecordReader.html (2014). Accessed 16 Jun 2014
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Article MATH Google Scholar
Sandhya, N., Govardhan, A.: Analysis of similarity measures with wordnet based text document clustering. In: Proceedings of the International Conference on Information Systems Design and Intelligent Applications, pp. 703–714 (2012)
Shang, W.: Log engineering: towards systematic log mining to support the development of ultra-large scale systems. PhD thesis, Queen’s University (2014)
Shang, W., Jiang, Z.M., Adams, B., Hassan, A.E., Godfrey, M.W., Nasser, M., Flora, P.: An exploratory study of the evolution of communicated information about the execution of large software systems. In: Proceedings of the Working Conference on Reverse Engineering, pp. 335–344 (2011)
Shang, W., Jiang, Z.M., Hemmati, H., Adams, B., Hassan, A.E., Martin, P.: Assisting developers of big data analytics applications when deploying on hadoop clouds. In: Proceedings of the International Conference on Software Engineering, pp. 402–411 (2013)
Shang, W., Nagappan, M., Hassan, A.E.: Studying the relationship between logging characteristics and the code quality of platform software. Empir. Softw. Eng. 20(1), 20:1–20:27 (2015)
SiliconBeat: Firefox download stunt sets record for quickest meltdown. http://www.siliconbeat.com/2008/06/17/firefox-download-stunt-sets-record-for-quickest-meltdown/ (2008). Accessed 25 Jan 2014
Software Engineering Institute: Ultra-Large-Scale Systems: The Software Challenge of the Future. Carnegie Mellon University, Pittsburgh (2006)
Google Scholar
Sokal, R.R., Rohlf, F.J.: Biometry: The Principles and Practice of Statistics in Biological Research, 4th edn. W. H. Freeman, New York (2011)
MATH Google Scholar
Student: The probable error of a mean. Biometrika 6(1), 1–25 (1908)
Syer, M.D., Adams, B., Hassan A.E.: Identifying performance deviations in thread pools. In: Proceedings of the International Conference on Software Maintenance, pp. 83–92 (2011a)
Syer, M.D., Adams, B., Hassan A.E.: Industrial case study on supporting the comprehension of system behaviour. In: Proceedings of the International Conference on Program Comprehension, pp. 215–216 (2011b)
Syer, M.D., Jiang, Z.M., Nagappan, M., Hassan, A.E., Nasser, M., Flora, P.: Leveraging performance counters and execution logs to diagnose memory-related performance issues. In: Proceedings of the International Conference on Software Maintenance, pp. 110–119 (2013)
Syer, M.D., Jiang, Z.M., Nagappan, M., Hassan, A.E., Nasser, M., Flora, P.: Continuous validation of load test suites. In: Proceedings of the International Conference on Performance Engineering, pp. 259–270 (2014)
Tan, P.N., Steinbach, M., Kumar, V.: Cluster Analysis: Basic Concepts and Algorithms, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston (2005)
Google Scholar
TextInputFormat: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TextInputFormat.html (2014). Accessed 16 Jun 2014
The Sarbanes-Oxley Act 2002: http://soxlaw.com/ (2014). Accessed 28 Jan 2014
Twitter: New Tweets per second record, and how! https://blog.twitter.com/2013/new-tweets-per-second-record-and-how (2013). Accessed 12 Dec 2014
Uh, G.R., Cohn, R., Yadavalli, B., Peri, R., Ayyagari, R.: Analyzing dynamic binary instrumentation overhead. In: Proceedings of the Workshop on Binary Instrumentation and Applications, pp. 56–64 (2006)
Voas, J.: Will the real operational profile please stand up? IEEE Softw. 17(2), 87–89 (2000)
Google Scholar
Welch, B.L.: The generalization of “student’s” problem when several different population variances are involved. Biometrika 34(1–2), 28–35 (1997)
MathSciNet MATH Google Scholar
Weyuker, E., Vokolos, F.: Experience with performance testing of software systems: issues, an approach, and case study. Trans. Softw. Eng. 26(12), 1147–1156 (2000)
Article Google Scholar
Williams, A.: Amazon web services outage caused by memory leak and failure in monitoring alarm. http://techcrunch.com/2012/10/27/amazon-web-services-outage-caused-by-memory-leak-and-failure-in-monitoring-alarm/ (2012). Accessed 09 Dec 2014
Yuan, D., Luo, Y., Zhuang, X., Rodrigues, G.R., Zhao, X., Zhang, Y., Jain, P.U., Stumm, M.: Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In: Proceedings of the Conference on Operating Systems Design and Implementation, pp. 249–265 (2014)
Zhang, J., Cheung, S.C.: Automated test case generation for the stress testing of multimedia systems. Softw. Pract. Exp. 32, 1411–1435 (2002)
Article MATH Google Scholar
Zhang, Z., Cherkasova, L., Loo B.T. Benchmarking approach for designing a mapreduce performance model. In: Proceedings of the International Conference on Performance Engineering, pp. 253–258 (2013)

Download references

Acknowledgments

We would like to thank BlackBerry for providing access to the enterprise system used in our case study. The findings and opinions expressed in this paper are those of the authors and do not necessarily represent or reflect those of BlackBerry and/or its subsidiaries and affiliates. Moreover, our results do not reflect the quality of BlackBerry’s products. We would also like to thank Microsoft Azure for (1) providing us access to a large-scale deployment and (2) working closely with us to setup and troubleshoot our deployment.

Author information

Authors and Affiliations

Software Analysis and Intelligence Lab (SAIL), School of Computing, Queen’s University, Kingston, Canada
Mark D. Syer, Weiyi Shang & Ahmed E. Hassan
Department of Electrical Engineering & Computer Science, York University, Toronto, Canada
Zhen Ming Jiang

Authors

Mark D. Syer
View author publications
You can also search for this author inPubMed Google Scholar
Weiyi Shang
View author publications
You can also search for this author inPubMed Google Scholar
Zhen Ming Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Ahmed E. Hassan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mark D. Syer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Syer, M.D., Shang, W., Jiang, Z.M. et al. Continuous validation of performance test workloads. Autom Softw Eng 24, 189–231 (2017). https://doi.org/10.1007/s10515-016-0196-8

Download citation

Received: 30 June 2014
Accepted: 14 March 2016
Published: 01 April 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10515-016-0196-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continuous validation of performance test workloads

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Finding Performance Patterns from Logs with High Confidence

Towards an Efficient Performance Testing Through Dynamic Workload Adaptation

Systematic Construction, Execution, and Reproduction of Complex Performance Benchmarks

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now