Abstract
Scientific experiments in various domains require nowadays collecting, processing, and reusing data. Researchers have to comply with funder policies that prescribe how data should be managed, shared and preserved. In most cases this has to be documented in data management plans. When data is selected and moved into a repository when project ends, it is often hard for researchers to identify which files need to be preserved and where they are located. For this reason, we need a mechanism that allows researchers to integrate preservation functionality into their daily workflows of data management to avoid situations in which scientific data is not properly preserved.
In this paper we demonstrate how systems used for managing data during research can be extended with preservation functions using process engines that run pre-defined preservation workflows. We also show a prototype of a machine-actionable data management plan that is automatically generated during this process to document actions performed. Thus, we break the traditional distinction between platforms for managing data during research and repositories used for preservation afterwards. Furthermore, we show how researchers can easier comply with funder requirements while reducing their effort.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
IBM Business Process Manager: https://www.ibm.com/us-en/marketplace/business-process-manager.
- 6.
A JSON format supporting the use of ontologies: https://json-ld.org/.
- 7.
References
H2020 programme guidelines on fair data management in horizon 2020. EC Directorate General for Research and Innovation (2016)
Bankier, J.G.: Institutional repository software comparison. UNESCO (2014)
Castagne, M.: Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra. University of British Columbia (2013)
Chen, X., et al.: CERN analysis preservation: a novel digital library service to enable reusable and reproducible research. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds.) TPDL 2016. LNCS, vol. 9819, pp. 347–356. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43997-6_27
European Commission: European Open Science Cloud Declaration (2017)
DCC: Checklist for a Data Management Plan. v. 4.0. Edinburgh: Digital Curation Centre (2013). http://www.dcc.ac.uk/resources/data-management-plans. Accessed 29 Mar 2018
Darema, F.: Dynamic data driven applications systems: a new paradigm for application simulations and measurements. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3038, pp. 662–669. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24688-6_86
Hey, T., Tansley, S., Tolle, K.: The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond (2009)
Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data management and stewardship. Nature Sci. Data 3 (2016). Article no. 160018
Miksa, T., Rauber, A., Mina, E.: Identifying impact of software dependencies on replicability of biomedical workflows. J. Biomed. Inform. 64(C), 232–254 (2016)
Otto, B.: Data governance. Microsoft Res. 4, 241–246 (2011)
Proell, S., Meixner, K., Rauber, A.: Precise data identification services for long tail research data. In: iPRES 2016 (2016)
Rauber, A., Miksa, T., Ganguly, R., Budroni, P.: Information integration for machine actionable data management plans. In: IDCC 2017 (2017)
Rosa, C.A., Craveiro, O., Domingues, P.: Open source software for digital preservation repositories: a survey. Int. J. Comput. Sci. Eng. Surv. (IJCSES) 8(3), 21–39 (2017)
Schembera, B., Bönisch, T.: Challenges of research data management for high performance computing. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 140–151. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_12
Simms, S., Jones, S., Mietchen, D., Miksa, T.: Machine-actionable data management plans. Res. Ideas Outcomes 3, e13086 (2017)
Acknowledgments
This research was carried out in the context of the Austrian COMET K1 program and publicly funded by the Austrian Research Promotion Agency (FFG) and the Vienna Business Agency (WAW).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Bakos, A., Miksa, T., Rauber, A. (2018). Research Data Preservation Using Process Engines and Machine-Actionable Data Management Plans. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-00066-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00065-3
Online ISBN: 978-3-030-00066-0
eBook Packages: Computer ScienceComputer Science (R0)