skip to main content
column

R3: repeatability, reproducibility and rigor

Published: 18 March 2012 Publication History

Abstract

Computer systems research spans sub-disciplines that include embedded systems, programming languages and compilers, networking, and operating systems. Our contention is that a number of structural factors inhibit quality systems research. We highlight some of the factors we have encountered in our own work and observed in published papers and propose solutions that could both increase the productivity of researchers and the quality of their output.

References

[1]
Evaluate collaboratory: Experimental evaluation of software and systems in computer science. http://evaluate.inf.usi.ch/, 2011.
[2]
Reproducible research planet. http://www.rrplanet.com/, 2011.
[3]
B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalapeno virtual machine. IBM Syst. J., 39(1):211--238, 2000.
[4]
K. Baggerly and K. Coombes. Deriving chemo sensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. Annals of Applied Statistics, 2008.
[5]
S. Basu and A. DasGupta. Robustness of standard confidence intervals for location parameters under departure from normality. Annals of Statistics, 23(4):1433--1442, 1995.
[6]
S. Blackburn, R. Garner, K. S. McKinley, A. Diwan, S. Z. Guyer, A. Hosking, J. E. B. Moss, D. Stefanović, et al. The DaCapo benchmarks: Java benchmarking development and analysis. In Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), 2006.
[7]
S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and Water? High Performance Garbage Collection in Java with MMTk. In Proceedings of the 26th International Conference on Software Engineering (ICSE), pages 137--146, 2004.
[8]
B. Clark, T. Deshane, E. Dow, S. Evanchik, M. Finlayson, J. Herne, and J. N. Matthews. Xen and the art of repeated research. In USENIX Annual Technical Conference, 2004.
[9]
A. C. Davison and D. V. Hinkley. Bootstrap Methods and Their Applications. Cambridge University Press, 1997.
[10]
E. C. Fieller. Some problems in interval estimation. Journal of the Royal Statistical Society, pages 175--185, 1954.
[11]
A. Gal, B. Eich, M. Shaver, D. Anderson, D. Mandelin, M. R. Haghighat, B. Kaplan, G. Hoare, B. Zbarsky, J. Orendorff, J. Ruderman, E. W. Smith, R. Reitmaier, M. Bebenita, M. Chang, and M. Franz. Trace-based just-in-time type specialization for dynamic languages. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI), pages 465--478, 2009.
[12]
A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA), 2007.
[13]
A. Georges, L. Eeckhout, and D. Buytaert. Java performance evaluation through rigorous replay compilation. In Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA), 2008.
[14]
D. Gu, C. Verbrugge, and E. Gagnon. Code layout as a source of noise in JVM performance. In Component And Middleware Performance Workshop, OOPSLA, 2004.
[15]
M. Hocko and T. Kalibera. Reducing performance nondeterminism via cache-aware page allocation strategies. In Proceedings of the First Joint WOSP/SIPEW International Conference on Performance Engineering, pages 223--234, 2010.
[16]
R. Jain. The Art of Computer Systems Performance Analysis. Wiley, 1991.
[17]
T. Kalibera, L. Bulej, and P. Tuma. Automated detection of performance regressions: The Mono experience. In Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2005.
[18]
T. Kalibera, J. Hagelberg, P. Maj, F. Pizlo, B. Titzer, and J. Vitek. A family of real-time Java benchmarks. Concurrency and Computation: Practice and Experience, 2011.
[19]
T. Kalibera and P. Tuma. Precise regression benchmarking with random effects: Improving Mono benchmark results. In Third European Performance Engineering Workshop (EPEW), 2006.
[20]
L. Kirkup. Experimental Methods: An Introduction to the Analysis and Presentation of Data. Wiley, 1994.
[21]
D. J. Lilja. Measuring Computer Performance: A Practitioner's Guide. Cambridge University Press, 2000.
[22]
Y. Luo and L. K. John. Efficiently evaluating speedup using sampled processor simulation. Computer Architecture Letters, 4:22--25, 2004.
[23]
S. E. Maxwell and H. D. Delaney. Designing experiments and analyzing data: a model comparison perspective. Routledge, 2004.
[24]
T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing wrong data without doing anything obviously wrong! In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009.
[25]
D. Rasch and V. Guiard. The robustness of parametric statistical methods. Psychology Science, 46(2):175--208, 2004.
[26]
G. Richards, A. Gal, B. Eich, and J. Vitek. Automated construction of JavaScript benchmarks. In Conference on Object- Oriented Programing, Systems, Languages, and Applications (OOPSLA), 2011.
[27]
G. Richards, S. Lesbrene, B. Burg, and J. Vitek. An analysis of the dynamic behavior of JavaScript programs. In Proceedings of the ACM Programming Language Design and Implementation Conference (PLDI), June 2010.
[28]
B. N. Taylor and C. E. Kuyatt. Guidelines for evaluating and expressing the uncertainty of NIST measurement results. Technical Note 1297, National Institute of Standards and Technology, 1994.
[29]
M. Y. Vardi. Conferences vs. journals in computing research. Commun. ACM, 52(5):5, 2009.
[30]
D. S. Wallach. Rebooting the CS publication process. Commun. ACM, 54(10):32--35, 2011.
[31]
R. Wieringa, H. Heerkens, and B. Regnell. How to read and write a scientific evaluation paper. In Requirements Engineering Conference (RE), 2009.
[32]
E. B.Wilson. An Introduction to Scientific Research. McGraw Hill, 1952.

Cited By

View all
  • (2023)Enabling realistic experimentation of disaggregated RAN: Design, implementation, and validation of a modular multi-split eNodeBComputer Networks10.1016/j.comnet.2023.109993235(109993)Online publication date: Nov-2023
  • (2022)Empirical Formal Methods: Guidelines for Performing Empirical Studies on Formal MethodsSoftware10.3390/software10400171:4(381-416)Online publication date: 24-Sep-2022
  • (2022)Reliable Energy Measurement on Heterogeneous Systems–on–Chip Based EnvironmentsParallel Processing and Applied Mathematics10.1007/978-3-031-30442-2_28(371-382)Online publication date: 11-Sep-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 47, Issue 4a
Supplemental issue
April 2012
85 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2442776
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2012
Published in SIGPLAN Volume 47, Issue 4a

Check for updates

Qualifiers

  • Column

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)4
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Enabling realistic experimentation of disaggregated RAN: Design, implementation, and validation of a modular multi-split eNodeBComputer Networks10.1016/j.comnet.2023.109993235(109993)Online publication date: Nov-2023
  • (2022)Empirical Formal Methods: Guidelines for Performing Empirical Studies on Formal MethodsSoftware10.3390/software10400171:4(381-416)Online publication date: 24-Sep-2022
  • (2022)Reliable Energy Measurement on Heterogeneous Systems–on–Chip Based EnvironmentsParallel Processing and Applied Mathematics10.1007/978-3-031-30442-2_28(371-382)Online publication date: 11-Sep-2022
  • (2021)Methodological Principles for Reproducible Performance Evaluation in Cloud ComputingIEEE Transactions on Software Engineering10.1109/TSE.2019.292790847:8(1528-1543)Online publication date: 1-Aug-2021
  • (2020)Rancid: Reliable Benchmarking on Android PlatformsIEEE Access10.1109/ACCESS.2020.30145338(143342-143358)Online publication date: 2020
  • (2019)Computational reproducibility of scientific workflows at extreme scalesInternational Journal of High Performance Computing Applications10.1177/109434201983912433:5(763-776)Online publication date: 1-Sep-2019
  • (2018)Migrating business logic to an incremental computing DSL: a case studyProceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3276604.3276617(83-96)Online publication date: 24-Oct-2018
  • (2018)Massivizing Computer Systems: A Vision to Understand, Design, and Engineer Computer Ecosystems Through and Beyond Modern Distributed Systems2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2018.00122(1224-1237)Online publication date: Jul-2018
  • (2017)Taming the Complexity of Artifact ReproducibilityProceedings of the Reproducibility Workshop10.1145/3097766.3097770(14-16)Online publication date: 11-Aug-2017
  • (2017)Hadoop-benchmarkProceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1109/SEAMS.2017.15(175-181)Online publication date: 20-May-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media