skip to main content
10.1145/2451116.2451141acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

STABILIZER: statistically sound performance evaluation

Published: 16 March 2013 Publication History

Abstract

Researchers and software developers require effective performance evaluation. Researchers must evaluate optimizations or measure overhead. Software developers use automatic performance regression tests to discover when changes improve or degrade performance. The standard methodology is to compare execution times before and after applying changes.
Unfortunately, modern architectural features make this approach unsound. Statistically sound evaluation requires multiple samples to test whether one can or cannot (with high confidence) reject the null hypothesis that results are the same before and after. However, caches and branch predictors make performance dependent on machine-specific parameters and the exact layout of code, stack frames, and heap objects. A single binary constitutes just one sample from the space of program layouts, regardless of the number of runs. Since compiler optimizations and code changes also alter layout, it is currently impossible to distinguish the impact of an optimization from that of its layout effects.
This paper presents Stabilizer, a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures. Stabilizer forces executions to sample the space of memory configurations by repeatedly re-randomizing layouts of code, stack, and heap objects at runtime. Stabilizer thus makes it possible to control for layout effects. Re-randomization also ensures that layout effects follow a Gaussian distribution, enabling the use of statistical tests like ANOVA. We demonstrate Stabilizer's efficiency (<7% median overhead) and its effectiveness by evaluating the impact of LLVM's optimizations on the SPEC CPU2006 benchmark suite. We find that, while -O2 has a significant impact relative to -O1, the performance impact of -O3 over -O2 optimizations is indistinguishable from random noise.

References

[1]
A. Alameldeen and D. Wood. Variability in Architectural Simulations of Multi-threaded Workloads. In HPCA '03, pp. 7--18. IEEE Computer Society, 2003.
[2]
L. E. Bassham, III, A. L. Rukhin, J. Soto, J. R. Nechvatal, M. E. Smid, E. B. Barker, S. D. Leigh, M. Levenson, M. Vangel, D. L. Banks, N. A. Heckert, J. F. Dray, and S. Vo. SP 800--22 Rev. 1a. A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. Tech. rep., National Institute of Standards & Technology, Gaithersburg, MD, United States, 2010.
[3]
E. D. Berger and B. G. Zorn. DieHard: Probabilistic Memory Safety for Unsafe Languages. In PLDI '06, pp. 158--168. ACM, 2006.
[4]
E. D. Berger, B. G. Zorn, and K. S. McKinley. Composing High-Performance Memory Allocators. In PLDI '01, pp. 114--124. ACM, 2001.
[5]
S. Bhatkar, D. C. DuVarney, and R. Sekar. Address Obfuscation: an Efficient Approach to Combat a Broad Range of Memory Error Exploits. In USENIX Security '03, pp. 8--8. USENIX Association, 2003.
[6]
S. Bhatkar, R. Sekar, and D. C. DuVarney. Efficient Techniques for Comprehensive Protection from Memory Error Exploits. In SSYM '05, pp. 271---286. USENIX Association, 2005.
[7]
S. M. Blackburn, A. Diwan, M. Hauswirth, A. M. Memon, and P. F. Sweeney. Workshop on Experimental Evaluation of Software and Systems in Computer Science (Evaluate 2010). In SPLASH '10, pp. 291--292. ACM, 2010.
[8]
S. M. Blackburn, A. Diwan, M. Hauswirth, P. F. Sweeney, et al. TR 1: Can You Trust Your Experimental Results? Tech. rep., Evaluate Collaboratory, 2012.
[9]
A. Demers, M. Weiser, B. Hayes, H. Boehm, D. Bobrow, and S. Shenker. Combining Generational and Conservative Garbage Collection: Framework and Implementations. In POPL '90, pp. 261--269. ACM, 1990.
[10]
R. Durstenfeld. Algorithm 235: Random Permutation. Communications of the ACM, 7(7):420, 1964.
[11]
W. Feller. An Introduction to Probability Theory and Applications, volume 1. John Wiley & Sons Publishers, 3rd edition, 1968.
[12]
A. Georges, D. Buytaert, and L. Eeckhout. Statistically Rigorous Java Performance Evaluation. In OOPSLA '07, pp. 57--76. ACM, 2007.
[13]
G. Hamerly, E. Perelman, J. Lau, B. Calder, and T. Sherwood. Using Machine Learning to Guide Architecture Simulation. Journal of Machine Learning Research, 7:343--378, Dec. 2006.
[14]
C. A. R. Hoare. Quicksort. The Computer Journal, 5(1):10--16, 1962.
[15]
D. A. Jiménez. Code Placement for Improving Dynamic Branch Prediction Accuracy. In PLDI '05, pp. 107--116. ACM, 2005.
[16]
C. Kil, J. Jun, C. Bookholt, J. Xu, and P. Ning. Address Space Layout Permutation (ASLP): Towards Fine-Grained Randomization of Commodity Software. In ACSAC '06, pp. 339--348. IEEE Computer Society, 2006.
[17]
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO '04, pp. 75--86. IEEE Computer Society, 2004.
[18]
G. Marsaglia. Random Number Generation. In Encyclopedia of Computer Science, 4th Edition, pp. 1499--1503. John Wiley and Sons Ltd., Chichester, UK, 2003.
[19]
M. Masmano, I. Ripoll, A. Crespo, and J. Real. TLSF: A New Dynamic Memory Allocator for Real-Time Systems. In ECRTS '04, pp. 79--86. IEEE Computer Society, 2004.
[20]
I. Molnar. Exec-Shield. http://people.redhat.com/mingo/exec-shield/.
[21]
D. A. Moon. Garbage Collection in a Large LISP System. In LFP '84, pp. 235--246. ACM, 1984.
[22]
T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing Wrong Data Without Doing Anything Obviously Wrong! In ASPLOS '09, pp. 265--276. ACM, 2009.
[23]
G. Novark and E. D. Berger. DieHarder: Securing the Heap. In CCS '10, pp. 573--584. ACM, 2010.
[24]
G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: Automatically Correcting Memory Errors with High Probability. Communications of the ACM, 51(12):87--95, 2008.
[25]
The Chromium Project. Performance Dashboard. http://build.chromium.org/f/chromium/perf/dashboard/overview.html.
[26]
The LLVM Team. Clang: a C Language Family Frontend for LLVM. http://clang.llvm.org, 2012.
[27]
The LLVM Team. Dragonegg - Using LLVM as a GCC Backend. http://dragonegg.llvm.org, 2013.
[28]
The Mozilla Foundation. Buildbot/Talos. https://wiki.mozilla.org/Buildbot/Talos.
[29]
The PaX Team. The PaX Project. http://pax.grsecurity.net, 2001.
[30]
D. Tsafrir and D. Feitelson. Instability in Parallel Job Scheduling Simulation: the Role of Workload Flurries. In IPDPS '06. IEEE Computer Society, 2006.
[31]
D. Tsafrir, K. Ouaknine, and D. G. Feitelson. Reducing Performance Evaluation Sensitivity and Variability by Input Shaking. In MASCOTS '07, pp. 231--237. IEEE Computer Society, 2007.
[32]
F. Wilcoxon. Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6):80--83, 1945.
[33]
P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles. Dynamic Storage Allocation: A Survey and Critical Review. Lecture Notes in Computer Science, 986, 1995.
[34]
H. Xu and S. J. Chapin. Improving Address Space Randomization with a Dynamic Offset Randomization Technique. In SAC '06, pp. 384--391. ACM, 2006.
[35]
J. Xu, Z. Kalbarczyk, and R. Iyer. Transparent Runtime Randomization for Security. In SRDS '03, pp. 260--269. IEEE Computer Society, 2003.

Cited By

View all
  • (2024)AI-driven Java Performance Testing: Balancing Result Quality with Testing TimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695017(443-454)Online publication date: 27-Oct-2024
  • (2024)Quality Assurance for Non-trivial Systems: Use Case GCC PluginsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685557(1915-1916)Online publication date: 11-Sep-2024
  • (2024)Indexed Types for a Statically Safe WebAssemblyProceedings of the ACM on Programming Languages10.1145/36329228:POPL(2395-2424)Online publication date: 5-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
March 2013
574 pages
ISBN:9781450318709
DOI:10.1145/2451116
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 48, Issue 4
    ASPLOS '13
    April 2013
    540 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2499368
    Issue’s Table of Contents
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 41, Issue 1
    ASPLOS '13
    March 2013
    540 pages
    ISSN:0163-5964
    DOI:10.1145/2490301
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. measurement bias
  2. performance evaluation
  3. randomization

Qualifiers

  • Research-article

Conference

ASPLOS '13

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)11
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AI-driven Java Performance Testing: Balancing Result Quality with Testing TimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695017(443-454)Online publication date: 27-Oct-2024
  • (2024)Quality Assurance for Non-trivial Systems: Use Case GCC PluginsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685557(1915-1916)Online publication date: 11-Sep-2024
  • (2024)Indexed Types for a Statically Safe WebAssemblyProceedings of the ACM on Programming Languages10.1145/36329228:POPL(2395-2424)Online publication date: 5-Jan-2024
  • (2023)Formally Verifying Optimizations with Block SimulationsProceedings of the ACM on Programming Languages10.1145/36227997:OOPSLA2(59-88)Online publication date: 16-Oct-2023
  • (2023)Optimizing the Order of Bytecode Handlers in Interpreters using a Genetic AlgorithmProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577712(1384-1393)Online publication date: 27-Mar-2023
  • (2023)DeLag: Using Multi-Objective Optimization to Enhance the Detection of Latency Degradation Patterns in Service-Based SystemsIEEE Transactions on Software Engineering10.1109/TSE.2023.3266041(1-28)Online publication date: 2023
  • (2023)Analysing the Impact of Workloads on Modeling the Performance of Configurable Software Systems2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00176(2085-2097)Online publication date: May-2023
  • (2023)CryptOpt: Automatic Optimization of Straightline Code2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)10.1109/ICSE-Companion58688.2023.00042(141-145)Online publication date: May-2023
  • (2023)Testing a Formally Verified CompilerTests and Proofs10.1007/978-3-031-38828-6_3(40-48)Online publication date: 18-Jul-2023
  • (2022)A fast in-place interpreter for WebAssemblyProceedings of the ACM on Programming Languages10.1145/35633116:OOPSLA2(646-672)Online publication date: 31-Oct-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media