research-article

Community expectations for research artifacts and evaluation processes

Authors:
Ben Hermann

University of Paderborn, Germany

University of Paderborn, Germany

0000-0001-9848-2017
View Profile

,
Stefan Winter

TU Darmstadt, Germany

TU Darmstadt, Germany
View Profile

,
Janet Siegmund

TU Chemnitz, Germany

TU Chemnitz, Germany
View Profile

ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringNovember 2020Pages 469–480https://doi.org/10.1145/3368089.3409767

Published:08 November 2020Publication History

ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 469–480

ABSTRACT

Background. Artifact evaluation has been introduced into the software engineering and programming languages research community with a pilot at ESEC/FSE 2011 and has since then enjoyed a healthy adoption throughout the conference landscape. Objective. In this qualitative study, we examine the expectations of the community toward research artifacts and their evaluation processes. Method. We conducted a survey including all members of artifact evaluation committees of major conferences in the software engineering and programming language field since the first pilot and compared the answers to expectations set by calls for artifacts and reviewing guidelines. Results. While we find that some expectations exceed the ones expressed in calls and reviewing guidelines, there is no consensus on quality thresholds for artifacts in general. We observe very specific quality expectations for specific artifact types for review and later usage, but also a lack of their communication in calls. We also find problematic inconsistencies in the terminology used to express artifact evaluation’s most important purpose – replicability. Conclusion. We derive several actionable suggestions which can help to mature artifact evaluation in the inspected community and also to aid its introduction into other communities in computer science.

Supplemental Material

fse20main-p970-p-teaser.mp4

mp4

30 MB

Download

fse20main-p970-p-video.mp4

mp4

318.4 MB

Download

References

Monya Baker. 2016. 1,500 scientists lift the lid on reproducibility. Nature News 533, 7604 ( 2016 ), 452. https://www.nature.com/news/1-500-scientists-lift-thelid-on-reproducibility-1. 19970Google ScholarCross Ref
Anna Balazs. 2008. International vocabulary of metrology-basic and general concepts and associated terms. Chemistry International ( 2008 ), 20-1. https: //doi.org/10.1515/ci. 2008. 30.6. 21 Google ScholarCross Ref
Victor Basili, Forrest Shull, and Filippo Lanubile. 1999. Building Knowledge through Families of Experiments. IEEE Trans. Softw. Eng. 25, 4 ( 1999 ), 456-473. https://doi.org/10.1109/32.799939 Google ScholarDigital Library
Emery D. Berger, Celeste Hollenbeck, Petr Maj, Olga Vitek, and Jan Vitek. 2019. On the Impact of Programming Languages on Code Quality: A Reproduction Study. ACM Trans. Program. Lang. Syst. 41, 4, Article 21 (Oct. 2019 ), 24 pages. https://doi.org/10.1145/3340571 Google ScholarDigital Library
Karl Broman, Mine Cetinkaya-Rundel, Amy Nussbaum, Christopher Paciorek, Roger Peng, Daniel Turek, and Hadley Wickham. 2017. Recommendations to funding agencies for supporting reproducible research. https://www.amstat. org/asa/files/pdfs/POL-ReproducibleResearchRecommendations.pdf. Accessed: 2020-09-03.Google Scholar
B. R. Childers and P. K. Chrysanthis. 2017. Artifact Evaluation: Is It a Real Incentive?. In 2017 IEEE 13th International Conference on e-Science (e-Science). 488-489. https://doi.org/10.1109/eScience. 2017.79 Google ScholarCross Ref
Christian Collberg and Todd A. Proebsting. 2016. Repeatability in Computer Systems Research. Commun. ACM 59, 3 (Feb. 2016 ), 62-69. https://doi.org/10. 1145/2812803 Google ScholarDigital Library
Erin Dahlgren. 2019. Getting Research Software to Work: A Case Study on Artifact Evaluation for OOPSLA 2019. https://doi.org/10.5281/zenodo.4016657 Google ScholarCross Ref
Association for Computing Machinery. 2018. Artifact Review and Badging. https://www.acm.org/publications/policies/artifact-review-badging. Accessed: 2020-09-03.Google Scholar
Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini. 2017. CodeMatch: Obfuscation Won't Conceal Your Repackaged App. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017 ). Association for Computing Machinery, New York, NY, USA, 638-648. https://doi.org/10.1145/3106237.3106305 Google ScholarDigital Library
Matthias Hauswirth. [n.d.]. Artifact Evaluation. http://evaluate.inf. usi.ch/ artifacts. Accessed 2020-09-03.Google Scholar
Ben Hermann, Stefan Winter, and Janet Siegmund. 2020. Community Expectations for Research Artifacts and Evaluation Processes-Data & Scripts. https://doi.org/ 10.5281/zenodo.3951724 Google ScholarDigital Library
Robert Heumüller, Sebastian Nielebock, Jacob Krüger, and Frank Ortmeier. 2020. Publish or Perish, but do not Forget your Software Artifacts. Empirical Software Engineering ( 2020 ). https://doi.org/10.1007/s10664-020-09851-6 Preprint. Google ScholarCross Ref
William Hudson. 2013. Card Sorting. In The Encyclopedia of Human-Computer Interaction. The Interaction Design Foundation, Chapter 22.Google Scholar
Natalia Juristo and Sira Vegas. 2011. The Role of Non-exact Replications in Software Engineering Experiments. Empirical Software Engineering 16, 3 ( 2011 ), 295-324. https://doi.org/10.1007/s10664-010-9141-9 Google ScholarDigital Library
Shriram Krishnamurthi and Jan Vitek. 2015. The Real Software Crisis: Repeatability As a Core Value. Commun. ACM 58, 3 (Feb. 2015 ), 34-36. https: //doi.org/10.1145/2658987 Google ScholarDigital Library
J. Lung, J. Aranda, S. Easterbrook, and G. Wilson. 2008. On the dificulty of replicating human subjects studies in software engineering. In 2008 ACM/IEEE 30th International Conference on Software Engineering. 191-200. https://doi.org/ 10.1145/1368088.1368115 Google ScholarDigital Library
Daniel Méndez Fernández, Wolfgang Böhm, Andreas Vogelsang, Jakob Mund, Manfred Broy, Marco Kuhrmann, and Thorsten Weyer. 2019. Artefacts in software engineering: a fundamental positioning. Software & Systems Modeling 18, 5 ( 2019 ), 2777-2786.Google Scholar
Daniel Méndez Fernández, Martin Monperrus, Robert Feldt, and Thomas Zimmermann. 2019. The open science initiative of the Empirical Software Engineering journal. Empirical Software Engineering 24, 3 ( 2019 ), 1057-1060. https://doi.org/10.1007/s10664-019-09712-x Google ScholarDigital Library
Gregorio Robles. 2010. Replicating MSR : A study of the potential replicability of papers published in the Mining Software Repositories proceedings. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010 ). 171-180. https://doi.org/10.1109/MSR. 2010.5463348 Google ScholarCross Ref
Forrest J Shull, Jefrey C Carver, Sira Vegas, and Natalia Juristo. 2008. The role of replications in Empirical Software Engineering. Empirical Software Engineering 13, 2 ( 2008 ), 211-218. https://doi.org/10.1007/s10664-008-9060-1 Google ScholarDigital Library
Janet Siegmund, Norbert Siegmund, and Sven Apel. 2015. Views on Internal and External Validity in Empirical Software Engineering. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 9-19. https: //doi.org/10.1109/ICSE. 2015.24 Google ScholarCross Ref
Christopher S. Timperley, Lauren Herckis, Claire Le Goues, and Michael Hilton. 2020. Understanding and Improving Artifact Sharing in Software Engineering Research. arXiv:cs.SE/ 2008.01046Google Scholar
Chat Wacharamanotham, Lukas Eisenring, Steve Haroz, and Florian Echtler. 2020. Transparency of CHI Research Artifacts: Results of a Self-Reported Survey. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1-14. https://doi.org/10.1145/3313831.3376448 Google ScholarDigital Library

Index Terms

Community expectations for research artifacts and evaluation processes
1. General and reference
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
  2. Software notations and tools
    1. Software libraries and repositories

Recommendations

Thoughts about Artifact Badging

Reproducibility: the extent to which consistent results are obtained when an experiment is repeated, is important as a means to validate experimental results, promote integrity of research, and accelerate follow up work. Commitment to artifact reviewing ...
Read More
An Artifact Evaluation of NDP

Artifact badging aims to rank the quality of submitted research artifacts and promote reproducibility. However, artifact badging may not indicate inherent design and evaluation limitations.

This work explores current limits in artifact badging using a ...
Read More
Evaluating the artifacts of SIGCOMM papers

A growing fraction of the papers published by CCR and at SIGCOMM-sponsored conferences include artifacts such as software or datasets. Besides CCR, these artifacts were rarely evaluated. During the last months of 2018, we organised two different ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2020
1703 pages
ISBN:9781450370431
DOI:10.1145/3368089
General Chair:
Prem Devanbu
University of California at Davis, USA
,
Program Chairs:
Myra Cohen
Iowa State University, USA
,
Thomas Zimmermann
Microsoft Research, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 November 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
Author Tags
Artifact Evaluation
Replicability
Reproducibility
Study
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 542
  Total Downloads
- Downloads (Last 12 months)136
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Community expectations for research artifacts and evaluation processes

ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Thoughts about Artifact Badging

An Artifact Evaluation of NDP

Evaluating the artifacts of SIGCOMM papers