research-article

Producing wrong data without doing anything obviously wrong!

Authors:

Todd Mytkowicz,

Matthias Hauswirth,

Peter F. SweeneyAuthors Info & Claims

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

Pages 265 - 276

https://doi.org/10.1145/1508244.1508275

Published: 07 March 2009 Publication History

Abstract

This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in fact introduce a significant bias in an evaluation. This phenomenon is called measurement bias in the natural and social sciences.

Our results demonstrate that measurement bias is significant and commonplace in computer system evaluation. By significant we mean that measurement bias can lead to a performance analysis that either over-states an effect or even yields an incorrect conclusion. By commonplace we mean that measurement bias occurs in all architectures that we tried (Pentium 4, Core 2, and m5 O3CPU), both compilers that we tried (gcc and Intel's C compiler), and most of the SPEC CPU2006 C programs. Thus, we cannot ignore measurement bias. Nevertheless, in a literature survey of 133 recent papers from ASPLOS, PACT, PLDI, and CGO, we determined that none of the papers with experimental results adequately consider measurement bias.

Inspired by similar problems and their solutions in other sciences, we describe and demonstrate two methods, one for detecting (causal analysis) and one for avoiding (setup randomization) measurement bias.

References

[1]

Alaa R. Alameldeen and David A. Wood. Variability in architectural simulations of multi-threaded workloads. In IEEE HPCA,pages 7--18, 2003.

Digital Library

[2]

Nathan L. Binkert, Ronald G. Dreslinski, Lisa R. Hsu, Kevin T. Lim, Ali G. Saidi, and Steven K. Reinhardt. The m5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, 2006.

Digital Library

[3]

S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA, New York, NY, USA, October 2006. ACM Press.

Digital Library

[4]

Stephen M Blackburn, Perry Cheng, and Kathryn S McKinley. Myths and realities: The performance impact of garbage collection. In SIGMETRICS, pages 25--36. ACMPress, 2004.

Digital Library

[5]

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In SC, Dallas, Texas, November 2000.

Digital Library

[6]

Amer Diwan, Han Lee, Dirk Grunwald, and Keith Farkas. Energy consumption and garbage collection in low-powered computing. Technical Report CU-CS-930-02, University of Colorado, 1992.

[7]

Andy Georges, Dries Buytaert, and Lieven Eeckhout. Statistically rigorous Java performance evaluation. In OOPSLA, 2007.

Digital Library

[8]

Intel. Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide. http://www.intel.com/products/processor/manuals/. Order number: 253669--027US, July 2008.

[9]

John P. A. Ioannidis. Contradicted and initially stronger effects in highly cited clinical research. The journal of the American Medical Association (JAMA), 294:218--228, 2005.

[10]

Sam Kash Kachigan. Statistical Analysis: An Interdisciplinary Introduction to Univariate & Multivariate Methods. Radius Press, 1986.

[11]

Tomas Kalibera, Lubomir Bulej, and Petr Tuma. Benchmark precision and random initial state. In Proceedings of the 2005 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS 2005), pages 484--490, San Diego, CA, USA, 2005. SCS.

[12]

W. Korn, P. J. Teller, and G. Castillo. Just how accurate are performance counters? In Proceedings of the IEEE International Conference on Performance, Computing, and Communications (IPCCC'01), pages 303--310, 2001.

[13]

Han Lee, Daniel von Dincklage, Amer Diwan, and J. Eliot B. Moss. Understanding the behavior of compiler optimizations. Softw. Pract. Exper., 36(8):835--844, 2006.

Digital Library

[14]

M. Maxwell, P. Teller, L. Salayandia, and S.Moore. Accuracy of performance monitoring hardware. In Proceedings of the Los Alamos Computer Science Institute Symposium (LACSI'02), October 2002.

[15]

Shirley V. Moore. A comparison of counting and sampling modes of using performance monitoring hardware. In Proceedings of the International Conference on Computational Science-Part II (ICCS'02), pages 904--912, London, UK, 2002. Springer-Verlag.

Digital Library

[16]

Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 1st edition, 2000. Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmarks. http://www.spec.org/cpu2006/.

Digital Library

[17]

Dan Tsafrir, Keren Ouaknine, and Dror G. Feitelson. Reducing performance evaluation sensitivity and variability by input shaking. In MASCOTS, 2007.

Digital Library

Cited By

Blackburn SCai ZChen RYang XZhang JZigman JEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Rethinking Java Performance AnalysisProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707217(940-954)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707217
Traini LDi Menna FCortellessa VFilkov VRay BZhou M(2024)AI-driven Java Performance Testing: Balancing Result Quality with Testing TimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695017(443-454)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695017
Ba JRigger MRoychoudhury APaiva AAbreu RStorey M(2024)CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639076(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639076
Show More Cited By

Index Terms

Producing wrong data without doing anything obviously wrong!
1. General and reference
  1. Cross-computing tools and techniques
    1. Design

Recommendations

Producing wrong data without doing anything obviously wrong!
ASPLOS 2009

This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in ...
Producing wrong data without doing anything obviously wrong!
ASPLOS 2009

This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in ...
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery
MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture

Control and data speculation are widely used to improve processor performance. Correct speculation can reduce execution time, but incorrect speculation can lead to increased execution time and greater energy consumption. This paper proposes a mechanism ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

March 2009

358 pages

ISBN:9781605584065

DOI:10.1145/1508244

General Chair:
Mary Lou Soffa
University of Virginia, USA
,
Program Chair:
Mary Jane Irwin
Penn State University, USA

ACM SIGARCH Computer Architecture News Volume 37, Issue 1
ASPLOS 2009
March 2009
346 pages
ISSN:0163-5964
DOI:10.1145/2528521
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 44, Issue 3
ASPLOS 2009
March 2009
346 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1508284
Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPLOS09

Sponsor:

ASPLOS09: Architectural Support for Programming Languages and Operating Systems

March 7 - 11, 2009

DC, Washington, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

273
Total Citations
View Citations
6,013
Total Downloads

Downloads (Last 12 months)481
Downloads (Last 6 weeks)106

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Blackburn SCai ZChen RYang XZhang JZigman JEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Rethinking Java Performance AnalysisProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707217(940-954)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707217
Traini LDi Menna FCortellessa VFilkov VRay BZhou M(2024)AI-driven Java Performance Testing: Balancing Result Quality with Testing TimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695017(443-454)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695017
Ba JRigger MRoychoudhury APaiva AAbreu RStorey M(2024)CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639076(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639076
Richter IZhou JCriswell J(2024)DeTRAP: RISC-V Return Address Protection With Debug Triggers2024 IEEE Secure Development Conference (SecDev)10.1109/SecDev61143.2024.00021(166-177)Online publication date: 7-Oct-2024
https://doi.org/10.1109/SecDev61143.2024.00021
Schirmer TPfandzelter TBermbach D(2024)ElastiBench: Scalable Continuous Benchmarking on Cloud FaaS Platforms2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00016(83-92)Online publication date: 24-Sep-2024
https://doi.org/10.1109/IC2E61754.2024.00016
Gourdin LBonneau BBoulmé SMonniaux DBérard A(2023)Formally Verifying Optimizations with Block SimulationsProceedings of the ACM on Programming Languages10.1145/36227997:OOPSLA2(59-88)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622799
Lazarek LGreenman BFelleisen MDimoulas C(2023)How to Evaluate Blame for Gradual Types, Part 2Proceedings of the ACM on Programming Languages10.1145/36078367:ICFP(159-186)Online publication date: 31-Aug-2023
https://dl.acm.org/doi/10.1145/3607836
Japke NWitzko CGrambow MBermbach DFu SCarnevale LRana OVillari M(2023)The Early Microbenchmark Catches the Bug -- Studying Performance Issues Using Micro- and Application BenchmarksProceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing10.1145/3603166.3632128(1-10)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1145/3603166.3632128
Fried AStemmer-Grabow MWachter JVerbrugge CLhoták OShen X(2023)Register Allocation for Compressed ISAs in LLVMProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580261(122-132)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3578360.3580261
Traini LCortellessa V(2023)DeLag: Using Multi-Objective Optimization to Enhance the Detection of Latency Degradation Patterns in Service-Based SystemsIEEE Transactions on Software Engineering10.1109/TSE.2023.3266041(1-28)Online publication date: 2023
https://doi.org/10.1109/TSE.2023.3266041
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten