Abstract
Scientific research is supported by computing techniques and tools that allow for gathering, management, analysis, visualization, sharing, and reproduction of scientific data and its experiments. The simulations performed in this type of research are called in silico experiments, and they are commonly composed of several applications that execute traditional algorithms and methods. Reproducibility plays a key role and gives the ability to make changes in the data and test environment of a scientific experiment to evaluate the robustness of the proposed scientific method. By verifying and validating generated results of these experiments, there is an increase in productivity and quality of scientific data analysis processes resulting in the improvement of science development and production of complex data in various scientific domains. There are many challenges to enable experimental reproducibility in in silico experiments. Many of these challenges are related to guaranteeing that simulation programs and data are still available when scientists need to reproduce an experiment. Clouds can play a key role by offering the infrastructure for long-term preserving programs and data. The goal of this chapter is to characterize terms and requirements related to scientific reproducibility and show how clouds can aid the development and selection of reproducibility approaches in science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
References
Armbrust M, Armando F, Rean G et al (2010) A view of cloud computing. Commun ACM 53(4):50–58
Baggerly KA, Berry DA (2012) Reproducible research, Amstatnews: The Membership Magazine of the American Statistical Association
Barga R, Gannon D (2006) Scientific versus business workflows. In: Workflows for e-Science: scientific workflows for grids. Springer, pp 09–16
Belhajjame K, Roure DD (2012) Goble CA research object management: opportunities and challenges. In: Proceedings of the 2012 ACM conference on computer supported cooperative work – CSCW’2012. ACM, New York
Berriman GB, Groom SL (2013) (2011) How will astronomy archieves survive the data tsunami? ACM Queue 9:1–8
Brammer GR, Crosby RW, Matthews SJ et al (2011) Paper MĂ¢chĂ©: creating dynamic reproducible science. Proc Comput Sci 4:658–667
Cao B, Plale B, Subramanian G, Robertson Ed, Simmhan YL (2009) Provenance information model of Karma version 3. SERVICES I 2009:348–351
Chirigati F, Shasha D, Freire J (2013) Packing experiments for sharing and publication. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data – SIGMOD ’13, pp 977–980
Cooper MH (2010) Charting a course for software licensing and distribution. SIGUCCS 2010:153–156
da Cruz SMS, Barros PM, Bisch PM, Machado Campos ML, Mattoso M (2008) Provenance services for distributed workflows. CCGRID 2008:526–533
Davidson SB, Freire J (2008) Provenance and scientific workfows: challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data – SIGMOD ’08. pp 1345–1350
Deelman E, Berriman B, Chervenak A et al (2010) Metadata and provenance management. In: Shoshani A, Rotem D (eds) Scientific data management: challenges, technology and deployment. Chapman & Hall/CRC, BocaRaton
Deelman E, Singh G, Livny M, et al (2008) The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing, SC ’08, pp 1–12
de Oliveira D, Ocaña KACS, BaiĂ£o FA, Mattoso M (2012) A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J Grid Comput 10(3): 521–552
Donoho DL (2010) An invitation to reproducible computational research. Biostatistics 3:376–388
Donoho D, Maleki A, Rahman NI et al (2009) Reproducible research in computational harmonic analysis. Comput Sci Eng 11:8–18
Dudley JT, Butte AJ (2010) In silico research in the era of cloud computing. Nat Biotechnol 28:1181–185
Firtina C, Alkan C (2016) On genomic repeats and reproducibility. Bioinformatics 32(15):2243–2247
Freire J, Bonnet P, Shasha D (2012) Computational reproducibility: state-of-the-art, challenges, and database research opportunities. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data – SIGMOD’12. ACM, New York, pp 593–596
Freire J, Fuhr N, Rauber A (2016) Reproducibility of data-oriented experiments in e-Science (Dagstuhl Seminar 16041). Dagstuhl Rep 6(1):108–159
Gavish M, Donoho D (2011) A universal identifier for computational results. In: International conference on computational science, vol 4, pp 637–647
Gillam L, Antonopoulos N (2010) Cloud computing: principles, systems and applications. Springer, London
Goble C (2012) The reality of reproducibility in computational science: reproduce? repeat? rerun? and does it matter. Keynotes and panels. In: 8th IEEE international conference on e-Science, vol 327, pp 415–416
Gray J (2009) Jim Gray on eScience: a transformed scientific method. In: Hey T, Tansley S, Tolle K (ed) The fourth paradigm data-intensive scientific discovery. Microsoft Research, Redmond
Goble CA (2013) Results may vary: reproducibility, open science and all that Jazz. LISC@ISWC 2013:1
Greenberg J (2002) Metadata and the world wide web. Encycl Libr Inf Sci 72:244–261
Guo P (2012) CDE: a tool for creating portable experimental software packages. Comput Sci Eng 14:32–35
Guo PJ, Engler D (2011) CDE: using system call interposition to automatically create portable software packages. In: Proceedings of the 2011 USENIX conference on USENIX annual technical conference, USENIXATC’11, pp 21–21
Guo PJ, Seltzer M (2012) BURRITO: wrapping your lab notebook in computational infrastructure. In: Proceedings of 4th USENIX workshop on the theory and practice of provenance (TaPP’12)
Hanson B, Sugden A, Alberts B (2011) Making data maximally available. Science 331:649
Hiden H, Woodman S, Watson P, Cala J (2013) Developing cloud applications using the e-science central platform. R Soc Lond Philos Trans A Math Phys Eng Sci
Hinsen K (2011) A data and code model for reproducible research and executable. Proc Comput Sci 4:579–588
Howe B (2012) Virtual appliances, cloud computing, and reproducible research. Comput Sci Eng 14:36–41
Juve G et al (2013) Comparing futuregrid, Amazon EC2, and open science grid for scientific workflows. Comput Sci Eng 15:20–29
Karpathiotakis M, Branco M, Alagiannis I, Ailamaki (2014) A adaptive query processing on RAW data. Proc VLDB Endow 7:1119–1130
Klinginsmith J, Mahoui M, Wu YM (2011) Towards reproducible escience in the cloud. In: IEEE third international conference on cloud computing technology and science (CloudCom). pp 582–586
Koop D, Santos E, Mates P et al. (2011) Provenance-based infrastructure to support the life cycle of executable papers. Procedia Computer Science 4:648–657
Krishnamurthi S, Vitek J (2015) The real software crisis: repeatability as a core value. Communications da ACM 58:34–36
Macko P, Chiarini M, Seltzer M (2011) Collecting provenance via the Xen hypervisor. In: Proceedings of 3rd USENIX workshop on the theory and practice of provenance (TaPP ’11), pp 1–15
Marinho A, Murta L, Werner C, Braganholo V, da Cruz SMS, Ogasawara ES, Mattoso M (2012) ProvManager: a provenance management system for scientific workflows. Concurr Comput Pract Exp 24(13):1513–1530
Mcnutt M (2014) Journals unite for reproducibility. Science 346:679
Missier P, Woodman S et al (2013) Provenance and data differencing for workflow reproducibility analysis. Concurr Comput Pract Exp 28:995–1015
Moreau L, Groth P (2013) Provenance: an introduction to PROV. Synthesis lectures on the semantic web: theory and technology. Morgan & Claypool, San Rafael
Nowakowski P, Ciepiela E, Harezlak D et al (2011) The collage authoring environment. In: Executable paper grand challenge international conference on computational science, ICCS 2011, vol 4, pp 608–617
Oliveira D, Ogasawara E, BaiĂ£o F, Mattoso M (2010) SciCumulus: a lightweigh cloud middleware to explore many task computing paradigm in scientific workflows. In: IEEE 3rd international conference on cloud computing
Paskin N (2010) Digital Object Identifier (DOI) system. In: Bates MJ, Maack MN (eds) Encyclopedia of library and information sciences, 3rd edn, chap. 157 Taylor & Francis, pp 1586–1592
Peng R (2009) Reproducible research and biostatistic. Biostatistics 3:405–408
Pieter Van Gorp SM (2011) SHARE: a web portal for creating and sharing executable research papers. Int Conf Comput Sci 4:1–9
Schwab M, Karrenbach M, Claerbout J (2000) Making scientific computations reproducible. Comput Sci Eng 2:61–67
Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance in e-Science. SIGMOD Rec 34:31–36
Simmhan Y, Ramakrishnan L, Antoniu G, Goble CA (2016) Cloud computing for data-driven science and engineering. Concur Comput Pract Exp 28(4):947–949
Stodden V (2009) The legal framework for reproducible scientific research: licensing and copyright. Comput Sci Eng 11:35–40
Stodden V, Bailey DH, Borwein J et al (2013) Setting the default to reproducible: reproducibility in computational and experimental mathematics. Technical report, ICERM workshop reproducibility in computational and experimental mathematics
Strijkers R, Cushin R, Vasyunin D (2011) Toward executable scientific publications. Proc Comput Sci 4:707–715
Szalay AS, Blakeley JA (2009) Gray’s laws: database-centric computing in science. In: Hey T, Tansley S, Tolle KM (ed) The fourth paradigm. Microsoft research, Redmond, pp 5–11
Taylor I, Deelman E, Gannon DB et al (2006) Workfows for e-Science: scientific workfows for grids. Springer, New York/Secaucus
Vitek J, Kalibera T (2012) R3: repeatability, reproducibility and rigor. SIGPLAN 47:30–36
Yogesh L. Simmhan, Beth Plale, Gannon D (2008) Karma2: provenance management for data-driven workflows. Int J Web Serv Res 5(2):1–22
Acknowledgements
This work was partially funded by Brazilian agencies CAPES, FAPERJ, and CNPq.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
de Oliveira, A.H.M., de Oliveira, D., Mattoso, M. (2017). Clouds and Reproducibility: A Way to Go to Scientific Experiments?. In: Antonopoulos, N., Gillam, L. (eds) Cloud Computing. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-54645-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-54645-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54644-5
Online ISBN: 978-3-319-54645-2
eBook Packages: Computer ScienceComputer Science (R0)