skip to main content
10.1145/3368089.3409767acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections

Community expectations for research artifacts and evaluation processes

Published:08 November 2020Publication History

ABSTRACT

Background. Artifact evaluation has been introduced into the software engineering and programming languages research community with a pilot at ESEC/FSE 2011 and has since then enjoyed a healthy adoption throughout the conference landscape. Objective. In this qualitative study, we examine the expectations of the community toward research artifacts and their evaluation processes. Method. We conducted a survey including all members of artifact evaluation committees of major conferences in the software engineering and programming language field since the first pilot and compared the answers to expectations set by calls for artifacts and reviewing guidelines. Results. While we find that some expectations exceed the ones expressed in calls and reviewing guidelines, there is no consensus on quality thresholds for artifacts in general. We observe very specific quality expectations for specific artifact types for review and later usage, but also a lack of their communication in calls. We also find problematic inconsistencies in the terminology used to express artifact evaluation’s most important purpose – replicability. Conclusion. We derive several actionable suggestions which can help to mature artifact evaluation in the inspected community and also to aid its introduction into other communities in computer science.

Skip Supplemental Material Section

Supplemental Material

fse20main-p970-p-teaser.mp4

mp4

30 MB

fse20main-p970-p-video.mp4

mp4

318.4 MB

References

  1. Monya Baker. 2016. 1,500 scientists lift the lid on reproducibility. Nature News 533, 7604 ( 2016 ), 452. https://www.nature.com/news/1-500-scientists-lift-thelid-on-reproducibility-1. 19970Google ScholarGoogle ScholarCross RefCross Ref
  2. Anna Balazs. 2008. International vocabulary of metrology-basic and general concepts and associated terms. Chemistry International ( 2008 ), 20-1. https: //doi.org/10.1515/ci. 2008. 30.6. 21 Google ScholarGoogle ScholarCross RefCross Ref
  3. Victor Basili, Forrest Shull, and Filippo Lanubile. 1999. Building Knowledge through Families of Experiments. IEEE Trans. Softw. Eng. 25, 4 ( 1999 ), 456-473. https://doi.org/10.1109/32.799939 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Emery D. Berger, Celeste Hollenbeck, Petr Maj, Olga Vitek, and Jan Vitek. 2019. On the Impact of Programming Languages on Code Quality: A Reproduction Study. ACM Trans. Program. Lang. Syst. 41, 4, Article 21 (Oct. 2019 ), 24 pages. https://doi.org/10.1145/3340571 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Karl Broman, Mine Cetinkaya-Rundel, Amy Nussbaum, Christopher Paciorek, Roger Peng, Daniel Turek, and Hadley Wickham. 2017. Recommendations to funding agencies for supporting reproducible research. https://www.amstat. org/asa/files/pdfs/POL-ReproducibleResearchRecommendations.pdf. Accessed: 2020-09-03.Google ScholarGoogle Scholar
  6. B. R. Childers and P. K. Chrysanthis. 2017. Artifact Evaluation: Is It a Real Incentive?. In 2017 IEEE 13th International Conference on e-Science (e-Science). 488-489. https://doi.org/10.1109/eScience. 2017.79 Google ScholarGoogle ScholarCross RefCross Ref
  7. Christian Collberg and Todd A. Proebsting. 2016. Repeatability in Computer Systems Research. Commun. ACM 59, 3 (Feb. 2016 ), 62-69. https://doi.org/10. 1145/2812803 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Erin Dahlgren. 2019. Getting Research Software to Work: A Case Study on Artifact Evaluation for OOPSLA 2019. https://doi.org/10.5281/zenodo.4016657 Google ScholarGoogle ScholarCross RefCross Ref
  9. Association for Computing Machinery. 2018. Artifact Review and Badging. https://www.acm.org/publications/policies/artifact-review-badging. Accessed: 2020-09-03.Google ScholarGoogle Scholar
  10. Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini. 2017. CodeMatch: Obfuscation Won't Conceal Your Repackaged App. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017 ). Association for Computing Machinery, New York, NY, USA, 638-648. https://doi.org/10.1145/3106237.3106305 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Matthias Hauswirth. [n.d.]. Artifact Evaluation. http://evaluate.inf. usi.ch/ artifacts. Accessed 2020-09-03.Google ScholarGoogle Scholar
  12. Ben Hermann, Stefan Winter, and Janet Siegmund. 2020. Community Expectations for Research Artifacts and Evaluation Processes-Data & Scripts. https://doi.org/ 10.5281/zenodo.3951724 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Robert Heumüller, Sebastian Nielebock, Jacob Krüger, and Frank Ortmeier. 2020. Publish or Perish, but do not Forget your Software Artifacts. Empirical Software Engineering ( 2020 ). https://doi.org/10.1007/s10664-020-09851-6 Preprint. Google ScholarGoogle ScholarCross RefCross Ref
  14. William Hudson. 2013. Card Sorting. In The Encyclopedia of Human-Computer Interaction. The Interaction Design Foundation, Chapter 22.Google ScholarGoogle Scholar
  15. Natalia Juristo and Sira Vegas. 2011. The Role of Non-exact Replications in Software Engineering Experiments. Empirical Software Engineering 16, 3 ( 2011 ), 295-324. https://doi.org/10.1007/s10664-010-9141-9 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shriram Krishnamurthi and Jan Vitek. 2015. The Real Software Crisis: Repeatability As a Core Value. Commun. ACM 58, 3 (Feb. 2015 ), 34-36. https: //doi.org/10.1145/2658987 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Lung, J. Aranda, S. Easterbrook, and G. Wilson. 2008. On the dificulty of replicating human subjects studies in software engineering. In 2008 ACM/IEEE 30th International Conference on Software Engineering. 191-200. https://doi.org/ 10.1145/1368088.1368115 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Daniel Méndez Fernández, Wolfgang Böhm, Andreas Vogelsang, Jakob Mund, Manfred Broy, Marco Kuhrmann, and Thorsten Weyer. 2019. Artefacts in software engineering: a fundamental positioning. Software & Systems Modeling 18, 5 ( 2019 ), 2777-2786.Google ScholarGoogle Scholar
  19. Daniel Méndez Fernández, Martin Monperrus, Robert Feldt, and Thomas Zimmermann. 2019. The open science initiative of the Empirical Software Engineering journal. Empirical Software Engineering 24, 3 ( 2019 ), 1057-1060. https://doi.org/10.1007/s10664-019-09712-x Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gregorio Robles. 2010. Replicating MSR : A study of the potential replicability of papers published in the Mining Software Repositories proceedings. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010 ). 171-180. https://doi.org/10.1109/MSR. 2010.5463348 Google ScholarGoogle ScholarCross RefCross Ref
  21. Forrest J Shull, Jefrey C Carver, Sira Vegas, and Natalia Juristo. 2008. The role of replications in Empirical Software Engineering. Empirical Software Engineering 13, 2 ( 2008 ), 211-218. https://doi.org/10.1007/s10664-008-9060-1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Janet Siegmund, Norbert Siegmund, and Sven Apel. 2015. Views on Internal and External Validity in Empirical Software Engineering. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 9-19. https: //doi.org/10.1109/ICSE. 2015.24 Google ScholarGoogle ScholarCross RefCross Ref
  23. Christopher S. Timperley, Lauren Herckis, Claire Le Goues, and Michael Hilton. 2020. Understanding and Improving Artifact Sharing in Software Engineering Research. arXiv:cs.SE/ 2008.01046Google ScholarGoogle Scholar
  24. Chat Wacharamanotham, Lukas Eisenring, Steve Haroz, and Florian Echtler. 2020. Transparency of CHI Research Artifacts: Results of a Self-Reported Survey. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1-14. https://doi.org/10.1145/3313831.3376448 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Community expectations for research artifacts and evaluation processes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
          November 2020
          1703 pages
          ISBN:9781450370431
          DOI:10.1145/3368089

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 November 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate112of543submissions,21%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader