skip to main content
10.1145/1508244.1508275acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Producing wrong data without doing anything obviously wrong!

Published: 07 March 2009 Publication History

Abstract

This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in fact introduce a significant bias in an evaluation. This phenomenon is called measurement bias in the natural and social sciences.
Our results demonstrate that measurement bias is significant and commonplace in computer system evaluation. By significant we mean that measurement bias can lead to a performance analysis that either over-states an effect or even yields an incorrect conclusion. By commonplace we mean that measurement bias occurs in all architectures that we tried (Pentium 4, Core 2, and m5 O3CPU), both compilers that we tried (gcc and Intel's C compiler), and most of the SPEC CPU2006 C programs. Thus, we cannot ignore measurement bias. Nevertheless, in a literature survey of 133 recent papers from ASPLOS, PACT, PLDI, and CGO, we determined that none of the papers with experimental results adequately consider measurement bias.
Inspired by similar problems and their solutions in other sciences, we describe and demonstrate two methods, one for detecting (causal analysis) and one for avoiding (setup randomization) measurement bias.

References

[1]
Alaa R. Alameldeen and David A. Wood. Variability in architectural simulations of multi-threaded workloads. In IEEE HPCA,pages 7--18, 2003.
[2]
Nathan L. Binkert, Ronald G. Dreslinski, Lisa R. Hsu, Kevin T. Lim, Ali G. Saidi, and Steven K. Reinhardt. The m5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, 2006.
[3]
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA, New York, NY, USA, October 2006. ACM Press.
[4]
Stephen M Blackburn, Perry Cheng, and Kathryn S McKinley. Myths and realities: The performance impact of garbage collection. In SIGMETRICS, pages 25--36. ACMPress, 2004.
[5]
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In SC, Dallas, Texas, November 2000.
[6]
Amer Diwan, Han Lee, Dirk Grunwald, and Keith Farkas. Energy consumption and garbage collection in low-powered computing. Technical Report CU-CS-930-02, University of Colorado, 1992.
[7]
Andy Georges, Dries Buytaert, and Lieven Eeckhout. Statistically rigorous Java performance evaluation. In OOPSLA, 2007.
[8]
Intel. Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide. http://www.intel.com/products/processor/manuals/. Order number: 253669--027US, July 2008.
[9]
John P. A. Ioannidis. Contradicted and initially stronger effects in highly cited clinical research. The journal of the American Medical Association (JAMA), 294:218--228, 2005.
[10]
Sam Kash Kachigan. Statistical Analysis: An Interdisciplinary Introduction to Univariate & Multivariate Methods. Radius Press, 1986.
[11]
Tomas Kalibera, Lubomir Bulej, and Petr Tuma. Benchmark precision and random initial state. In Proceedings of the 2005 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS 2005), pages 484--490, San Diego, CA, USA, 2005. SCS.
[12]
W. Korn, P. J. Teller, and G. Castillo. Just how accurate are performance counters? In Proceedings of the IEEE International Conference on Performance, Computing, and Communications (IPCCC'01), pages 303--310, 2001.
[13]
Han Lee, Daniel von Dincklage, Amer Diwan, and J. Eliot B. Moss. Understanding the behavior of compiler optimizations. Softw. Pract. Exper., 36(8):835--844, 2006.
[14]
M. Maxwell, P. Teller, L. Salayandia, and S.Moore. Accuracy of performance monitoring hardware. In Proceedings of the Los Alamos Computer Science Institute Symposium (LACSI'02), October 2002.
[15]
Shirley V. Moore. A comparison of counting and sampling modes of using performance monitoring hardware. In Proceedings of the International Conference on Computational Science-Part II (ICCS'02), pages 904--912, London, UK, 2002. Springer-Verlag.
[16]
Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 1st edition, 2000. Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmarks. http://www.spec.org/cpu2006/.
[17]
Dan Tsafrir, Keren Ouaknine, and Dror G. Feitelson. Reducing performance evaluation sensitivity and variability by input shaking. In MASCOTS, 2007.

Cited By

View all
  • (2025)Rethinking Java Performance AnalysisProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707217(940-954)Online publication date: 3-Feb-2025
  • (2024)AI-driven Java Performance Testing: Balancing Result Quality with Testing TimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695017(443-454)Online publication date: 27-Oct-2024
  • (2024)CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639076(1-13)Online publication date: 20-May-2024
  • Show More Cited By

Index Terms

  1. Producing wrong data without doing anything obviously wrong!

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
    March 2009
    358 pages
    ISBN:9781605584065
    DOI:10.1145/1508244
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 37, Issue 1
      ASPLOS 2009
      March 2009
      346 pages
      ISSN:0163-5964
      DOI:10.1145/2528521
      Issue’s Table of Contents
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 44, Issue 3
      ASPLOS 2009
      March 2009
      346 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1508284
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 March 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bias
    2. measurement
    3. performance

    Qualifiers

    • Research-article

    Conference

    ASPLOS09

    Acceptance Rates

    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)481
    • Downloads (Last 6 weeks)106
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Rethinking Java Performance AnalysisProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707217(940-954)Online publication date: 3-Feb-2025
    • (2024)AI-driven Java Performance Testing: Balancing Result Quality with Testing TimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695017(443-454)Online publication date: 27-Oct-2024
    • (2024)CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639076(1-13)Online publication date: 20-May-2024
    • (2024)DeTRAP: RISC-V Return Address Protection With Debug Triggers2024 IEEE Secure Development Conference (SecDev)10.1109/SecDev61143.2024.00021(166-177)Online publication date: 7-Oct-2024
    • (2024)ElastiBench: Scalable Continuous Benchmarking on Cloud FaaS Platforms2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00016(83-92)Online publication date: 24-Sep-2024
    • (2023)Formally Verifying Optimizations with Block SimulationsProceedings of the ACM on Programming Languages10.1145/36227997:OOPSLA2(59-88)Online publication date: 16-Oct-2023
    • (2023)How to Evaluate Blame for Gradual Types, Part 2Proceedings of the ACM on Programming Languages10.1145/36078367:ICFP(159-186)Online publication date: 31-Aug-2023
    • (2023)The Early Microbenchmark Catches the Bug -- Studying Performance Issues Using Micro- and Application BenchmarksProceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing10.1145/3603166.3632128(1-10)Online publication date: 4-Dec-2023
    • (2023)Register Allocation for Compressed ISAs in LLVMProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580261(122-132)Online publication date: 17-Feb-2023
    • (2023)DeLag: Using Multi-Objective Optimization to Enhance the Detection of Latency Degradation Patterns in Service-Based SystemsIEEE Transactions on Software Engineering10.1109/TSE.2023.3266041(1-28)Online publication date: 2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media