skip to main content
10.1145/3603166.3632128acmconferencesArticle/Chapter ViewAbstractPublication PagesuccConference Proceedingsconference-collections
research-article

The Early Microbenchmark Catches the Bug -- Studying Performance Issues Using Micro- and Application Benchmarks

Published: 04 April 2024 Publication History

Abstract

An application's performance regressions can be detected by both application or microbenchmarks. While application benchmarks stress the system under test by sending synthetic but realistic requests which, e.g., simulate real user traffic, microbenchmarks evaluate the performance on a subroutine level by calling the function under test repeatedly.
In this paper, we use a testbed microservice application which includes three performance issues to study the detection capabilities of both approaches. In extensive benchmarking experiments, we increase the severity of each performance issue stepwise, run both an application benchmark and the microbenchmark suite, and check at which point each benchmark detects the performance issue. Our results show that microbenchmarks detect all three issues earlier, some even at the lowest severity level. Application benchmarks, however, raised false positive alarms, wrongly detected performance improvements, and detected the performance issues later.

References

[1]
Ali Abedi and Tim Brecht. 2017. Conducting Repeatable Experiments in Highly Variable Cloud Computing Environments. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (New York, NY, USA) (ICPE '17). Association for Computing Machinery, New York, NY, USA, 287--292.
[2]
Ali Abedi, Andrew Heard, and Tim Brecht. 2015. Conducting Repeatable Experiments and Fair Comparisons using 802.11n MIMO Networks. ACM SIGOPS Operating Systems Review 49, 1 (Jan. 2015), 41--50.
[3]
David Bermbach, Jörn Kuhlenkamp, Akon Dey, Arunmoezhi Ramachandran, Alan Fekete, and Stefan Tai. 2017. BenchFoundry: A Benchmarking Framework for Cloud Storage Services. In Proceedings of the 15th International Conference on Service Oriented Computing (Malaga, Spain) (ICSOC 2017). Springer, Cham, Switzerland, 314--330.
[4]
David Bermbach, Erik Wittern, and Stefan Tai. 2017. Cloud Service Benchmarking: Measuring Quality of Cloud Services from a Client Perspective. Springer, Cham, Switzerland.
[5]
Amir Hossein Borhani, Philipp Leitner, Bu-Sung Lee, Xiaorong Li, and Terence Hung. 2014. WPress: An Application-Driven Performance Benchmark for Cloud-Based Virtual Machines. In 18th IEEE International Enterprise Distributed Object Computing Conference, EDOC 2014, Ulm, Germany, September 1--5, 2014, Manfred Reichert, Stefanie Rinderle-Ma, and Georg Grossmann (Eds.). IEEE Computer Society, 101--109.
[6]
Lubomír Bulej, Vojtech Horký, and Petr Tuma. 2019. Initial Experiments with Duet Benchmarking: Performance Testing Interference in the Cloud. In 27th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2019, Rennes, France, October 21--25, 2019. IEEE Computer Society, 249--255.
[7]
Lubomír Bulej, Vojtech Horký, Petr Tuma, François Farquet, and Aleksandar Prokopec. 2020. Duet Benchmarking: Improving Measurement Accuracy in the Cloud. In ICPE '20: ACM/SPEC International Conference on Performance Engineering, Edmonton, AB, Canada, April 20--24, 2020, José Nelson Amaral, Anne Koziolek, Catia Trubiani, and Alexandru Iosup (Eds.). ACM, 100--107.
[8]
Jinfu Chen and Weiyi Shang. 2017. An Exploratory Study of Performance Regression Introducing Code Changes. In 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017, Shanghai, China, September 17--22, 2017. IEEE Computer Society, 341--352.
[9]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10--11, 2010, Joseph M. Hellerstein, Surajit Chaudhuri, and Mendel Rosenblum (Eds.). ACM, 143--154.
[10]
David Daly, William Brown, Henrik Ingo, Jim O'Leary, and David Bradford. 2020. The Use of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration System. In ICPE '20: ACM/SPEC International Conference on Performance Engineering, Edmonton, AB, Canada, April 20--24, 2020, José Nelson Amaral, Anne Koziolek, Catia Trubiani, and Alexandru Iosup (Eds.). ACM, 67--75.
[11]
Augusto Born De Oliveira, Sebastian Fischmeister, Amer Diwan, Matthias Hauswirth, and Peter F Sweeney. 2017. Perphecy: performance regression test selection made simple but effective. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE, 103--113.
[12]
Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudré-Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. Proceedings of the VLDB Endowment 7, 4 (2013), 277--288.
[13]
Matt Fleming, Piotr Kolaczkowski, Ishita Kumar, Shaunak Das, Sean McCarthy, Pushkala Pattabhiraman, and Henrik Ingo. 2023. Hunter: Using Change Point Detection to Hunt for Performance Regressions. In Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, ICPE 2023, Coimbra, Portugal, April 15--19, 2023, Marco Vieira, Valeria Cardellini, Antinisca Di Marco, and Petr Tuma (Eds.). ACM, 199--206.
[14]
John Franks, Phillip M. Hallam-Baker, Jeffery L. Hostetler, Scott D. Lawrence, Paul J. Leach, Ari Luotonen, and Lawrence C. Stewart. 1999. HTTP Authentication: Basic and Digest Access Authentication. RFC 2617. RFC Editor. http://www.rfc-editor.org/rfc/rfc2617.txt
[15]
Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous java performance evaluation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21--25, 2007, Montreal, Quebec, Canada, Richard P. Gabriel, David F. Bacon, Cristina Videira Lopes, and Guy L. Steele Jr. (Eds.). ACM, 57--76.
[16]
Martin Grambow, Denis Kovalev, Christoph Laaber, Philipp Leitner, and David Bermbach. 2022. Using Microbenchmark Suites to Detect Application Performance Changes. IEEE Transactions on Cloud Computing (2022), 1--18.
[17]
Martin Grambow, Christoph Laaber, Philipp Leitner, and David Bermbach. 2021. Using application benchmark call graphs to quantify and improve the practical relevance of microbenchmark suites. PeerJ Computer Science 7 (2021), e548.
[18]
Martin Grambow, Fabian Lehmann, and David Bermbach. 2019. Continuous Benchmarking: Using System Benchmarking in Build Pipelines. In IEEE International Conference on Cloud Engineering, IC2E 2019, Prague, Czech Republic, June 24--27, 2019. IEEE, 241--246.
[19]
Martin Grambow, Lukas Meusel, Erik Wittern, and David Bermbach. 2020. Benchmarking Microservice Performance: A Pattern-based Approach. In Proceedings of the 35th ACM Symposium on Applied Computing (Brno, Czech Republic) (SAC '20). ACM, New York, NY, USA, 232--241.
[20]
Martin Grambow, Erik Wittern, and David Bermbach. 2020. Benchmarking the Performance of Microservice Applications. ACM SIGAPP Applied Computing Review 20 (2020).
[21]
Sen He, Glenna Manns, John Saunders, Wei Wang, Lori Pollock, and Mary Lou Soffa. 2019. A Statistics-based Performance Testing Methodology for Cloud Applications. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). ACM, New York, NY, USA, 188--199.
[22]
Christoph Heger, Jens Happe, and Roozbeh Farahbod. 2013. Automated root cause isolation of performance regressions during software development. In Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering. 27--38.
[23]
Tim C. Hesterberg. 2015. What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum. The American Statistician 69, 4 (Oct. 2015), 371--386.
[24]
Omar Javed, Joshua Heneage Dawes, Marta Han, Giovanni Franzoni, Andreas Pfeiffer, Giles Reger, and Walter Binder. 2020. PerfCI: a toolchain for automated performance testing during continuous integration of Python projects. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1344--1348.
[25]
Tomas Kalibera and Richard Jones. 2020. Quantifying Performance Changes with Effect Size Confidence Intervals. (July 2020). arXiv:2007.10899
[26]
Christoph Laaber, Harald C. Gall, and Philipp Leitner. 2021. Applying test case prioritization to software microbenchmarks. Empirical Software Engineering 26, 6 (2021), 133.
[27]
Christoph Laaber and Philipp Leitner. 2018. An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment. In Proceedings of the 15th International Conference on Mining Software Repositories (Gothenburg, Sweden) (MSR '18). Association for Computing Machinery, New York, NY, USA, 119--130.
[28]
Christoph Laaber, Joel Scheuner, and Philipp Leitner. 2019. Software microbench-marking in the cloud. How bad is it really? Empirical Software Engineering 24, 4 (2019), 2469--2508.
[29]
Christoph Laaber, Tao Yue, and Shaukat Ali. 2022. Multi-Objective Search-Based Software Microbenchmark Prioritization. CoRR abs/2211.13525 (2022). arXiv:2211.13525
[30]
Philipp Leitner and Jürgen Cito. 2016. Patterns in the Chaos - A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Trans. Internet Techn. 16, 3 (2016), 15:1--15:23.
[31]
Shaikh Mostafa, Xiaoyin Wang, and Tao Xie. 2017. PerfRanker: prioritization of performance regression tests for collection-intensive software. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10 -- 14, 2017, Tevfik Bultan and Koushik Sen (Eds.). ACM, 23--34.
[32]
Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2009. Producing wrong data without doing anything obviously wrong!. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2009, Washington, DC, USA, March 7--11, 2009, Mary Lou Soffa and Mary Jane Irwin (Eds.). ACM, 265--276.
[33]
Thanh H. D. Nguyen, Meiyappan Nagappan, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2014. An Industrial Case Study of Automatically Identifying Performance Regression-Causes. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014). ACM, New York, NY, USA, 232--241. event-place: Hyderabad, India.
[34]
André van Hoorn, Jan Waller, and Wilhelm Hasselbring. 2012. Kieker: a framework for application performance monitoring and dynamic software analysis. In Third Joint WOSP/SIPEW International Conference on Performance Engineering, ICPE'12, Boston, MA, USA - April 22 -- 25, 2012, David R. Kaeli, Jerry Rolia, Lizy K. John, and Diwakar Krishnamurthy (Eds.). ACM, 247--248.
[35]
Shahed Zaman, Bram Adams, and Ahmed E. Hassan. 2012. A qualitative study on performance bugs. In 9th IEEE Working Conference of Mining Software Repositories, MSR 2012, June 2--3, 2012, Zurich, Switzerland, Michele Lanza, Massimiliano Di Penta, and Tao Xie (Eds.). IEEE Computer Society, 199--208.

Cited By

View all
  • (2024)Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00017(93-100)Online publication date: 24-Sep-2024
  • (2024)ElastiBench: Scalable Continuous Benchmarking on Cloud FaaS Platforms2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00016(83-92)Online publication date: 24-Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
UCC '23: Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing
December 2023
502 pages
ISBN:9798400702341
DOI:10.1145/3603166
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2024

Check for updates

Author Tags

  1. microbenchmarks
  2. benchmarking
  3. performance issues

Qualifiers

  • Research-article

Conference

UCC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 38 of 125 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)104
  • Downloads (Last 6 weeks)34
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00017(93-100)Online publication date: 24-Sep-2024
  • (2024)ElastiBench: Scalable Continuous Benchmarking on Cloud FaaS Platforms2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00016(83-92)Online publication date: 24-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media