Abstract
The replication crisis has further eroded the public’s trust in science. Many famous studies, even published in renowned journals, fail to produce the same results when replicated by other researchers. While this is the outcome of several problems in research, one aspect has gotten critical attention—reproducibility. The term reproducible research refers to studies that contain all materials necessary to reproduce the scientific results by other researchers. This allows others to identify flaws in calculations and improve scientific rigor. In this paper, we show a workflow for reproducible research using the R language and a set of additional packages and tools that simplify a reproducible research procedure.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aggarwal, C.C., Philip, S.Y.: A general survey of privacy-preserving data mining models and algorithms. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-preserving data mining, vol. 34, pp. 11–52. Springer, Heidelberg (2008). https://doi.org/10.1007/978-0-387-70992-5_2
Aust, F.: citr: RStudio Add-in to Insert Markdown Citations. R package version 0.3.2. (2019). https://CRAN.R-project.org/package=citr
Baker, M.: Reproducibility crisis. Nature 533(26), 353–66 (2016)
Barnier, J.: rmdformats: HTML Output Formats and Templates for ‘rmarkdown’ Documents. R package version 0.3.6. (2019). https://CRAN.R-project.org/package=rmdformats
Barnier, J., Briatte, F., Larmarange, J.: questionr: Functions to Make Surveys Processing Easier. R package version 0.7.0. (2018). https://CRAN.R-roject.org/package=questionr
Bryan, J.: Excuse me, do you have a moment to talk about version control? Am. Stat. 72(1), 20–27 (2018)
Valdez, A.C.: rmdtemplates: rmdtemplates - an opinionated collection of rmarkdown templates. R package version 0.4.0.0000. (2020). https://github.com/statisticsforsocialscience/rmd_templates
Chang, W.: webshot: Take Screenshots of Web Pages. R package version 0.5.2. (2019). https://CRAN.R-project.org/package=webshot
Colquhoun, D.: The reproducibility of research and the misinterpretation of p-values. Roy. Soc. Open Sci. 4(12), 171085 (2017)
Dumas, J., Marwick, B., Shotwell, G.: gramr: The Grammar of Grammar. R package version 0.0.0.9000. (2020). https://github.com/ropenscilabs/gramr
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Gentleman, R., Lang, D.T.: Statistical analyses and reproducible research. J. Comput. Graph. Stat. 16(1), 1–23 (2007)
Head, M.L., et al.: The extent and consequences of p-hacking in science. PLoS Biol. 13(3), e1002106 (2015)
Hendricks, P.: anonymizer: Anonymize data containing personally identifiable information. R package version 0.2.2. (2020). https://github.com/paulhendricks/anonymizer
Iannone, R.: DiagrammeR: Graph/Network Visualization. R package version 1.1.0. (2020). https://github.com/rich-iannone/DiagrammeR
Kerr, N.L.: HARKing: hypothesizing after the results are known. Pers. Soc. Psychol. Rev. 2(3), 196–217 (1998)
Landau, W.M.: drake: A Pipeline Toolkit for Reproducible Computation at Scale. R package version 7.10.0. (2020). https://CRAN.Rproject.org/package=drake
Lee, J., Clifton, C.: How much is enough? choosing e for differential privacy. Inf. Secur. 7001, 325–340 (2011)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2007)
Machanavajjhala, A., et al.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 3 (2007)
Marwick, B.: rrtools: Creates a Reproducible Research Compendium. R package version 0.1.0. (2019). https://github.com/benmarwick/rrtools
Marwick, B., Boettiger, C., Mullen, L.: Packaging data analytical work reproducibly using R (and friends). Am. Stat. 72(1), 80–88 (2018)
Meyerm, F., Perrier, V.: esquisse: Explore and Visualize Your Data Interactively. R package version 0.3.0. (2020). https://CRAN.Rproject.org/package=esquisse
Meyers, N.K.: Reproducible Research and the Open Science Framework (2017). https://osf.io/458u9/
Müller, K.: here: A Simpler Way to Find Your Files. R package version 0.1. (2017). https://CRAN.R-project.org/package=here
Open Science Collaboration et al.: Estimating the reproducibility of psychological science. Science 349(6251), aac4716 (2015)
Patil, I.: ggstatsplot: “ggplot2” Based Plots with Statistical Details. R package version 0.2.0. (2020). https://CRAN.R-project.org/package=ggstatsplot
Revelle, W.: psych: Procedures for Psychological, Psychometric, and Personality Research. R package version 1.9.12.31. (2020). https://CRAN.R-project.org/package=psych
Simonsohn, U., Nelson, L.D., Simmons, J.P.: p-curve and effect size: correcting for publication bias using only significant results. Perspect. Psychol. Sci. 9(6), 666–681 (2014)
Templ, M., Meindl, B., Kowarik, A.: sdcMicro: Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation. R package version 5.5.1. (2020). https://CRAN.Rproject.org/package=sdcMicro
Ushey, K.: renv: Project Environments. R package version 0.9.3-30. (2020). https://rstudio.github.io/renv
Ushey, K., et al.: packrat: A Dependency Management System for Projects and their R Package Dependencies. R package version 0.5.0. (2018). https://CRAN.R-project.org/package=packrat
Wickham, H.: forcats: Tools for Working with Categorical Variables (Factors) (2020). http://forcats.tidyverse.org, https://github.com/tidyverse/forcats
Wickham, H.: tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.3.0. (2019). https://CRAN.R-project.org/package=tidyverse
Wickham, H., Bryan, J.: usethis: Automate Package and Project Setup. R package version 1.5.1. (2019). https://CRAN.Rproject.org/package=usethis
Wickham, H., Seidel, D.: scales: Scale Functions for Visualization. R package version 1.1.0. (2019). https://CRAN.R-project.org/package=scales
Wilson, G., et al.: Good enough practices in scientific computing. PLoS Comput. Biol. 13(6), e1005510 (2017)
Wolen, A., Hartgerink, C.: osfr: Interface to the ‘Open Science Framework’ (‘OSF’). R package version 0.2.8. (2020). https://CRAN.Rproject.org/package=osfr
Xie, Y.: knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.28. (2020). https://CRAN.Rproject.org/package=knitr
Zhu, H.: kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.1.0. (2019). https://CRAN.R-project.org/package=kableExtra
Acknowledgements
This research was supported by the Digital Society research program funded by the Ministry of Culture and Science of the German State of North Rhine-Westphalia. We would further like to thank the authors of the packages we have used. We used the following packages to create this document: knitr [39], tidyverse [34], rmdformats [4], kableExtra [40], scales [36], psych [28], rmdtemplates [7], sdcMicro [30], webshot [8], here [25], DiagrammeR [15], citr [2], drake [17], esquisse [23], usethis [35], gramr [10], questionr [5], ggstatsplot [27].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Calero Valdez, A. (2020). Making Reproducible Research Simple Using RMarkdown and the OSF. In: Meiselwitz, G. (eds) Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis. HCII 2020. Lecture Notes in Computer Science(), vol 12194. Springer, Cham. https://doi.org/10.1007/978-3-030-49570-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-49570-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49569-5
Online ISBN: 978-3-030-49570-1
eBook Packages: Computer ScienceComputer Science (R0)