Research Data Preservation Using Process Engines and Machine-Actionable Data Management Plans

Bakos, Asztrik; Miksa, Tomasz; Rauber, Andreas

doi:10.1007/978-3-030-00066-0_6

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11057))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1770 Accesses
2 Citations

Abstract

Scientific experiments in various domains require nowadays collecting, processing, and reusing data. Researchers have to comply with funder policies that prescribe how data should be managed, shared and preserved. In most cases this has to be documented in data management plans. When data is selected and moved into a repository when project ends, it is often hard for researchers to identify which files need to be preserved and where they are located. For this reason, we need a mechanism that allows researchers to integrate preservation functionality into their daily workflows of data management to avoid situations in which scientific data is not properly preserved.

In this paper we demonstrate how systems used for managing data during research can be extended with preservation functions using process engines that run pre-defined preservation workflows. We also show a prototype of a machine-actionable data management plan that is automatically generated during this process to document actions performed. Thus, we break the traditional distinction between platforms for managing data during research and repositories used for preservation afterwards. Furthermore, we show how researchers can easier comply with funder requirements while reducing their effort.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.rd-alliance.org/groups/dmp-common-standards-wg.
2.
https://osf.io/.
3.
https://www.alfresco.com/.
4.
http://www.omg.org/bpmn/.
5.
IBM Business Process Manager: https://www.ibm.com/us-en/marketplace/business-process-manager.
6.
A JSON format supporting the use of ontologies: https://json-ld.org/.
7.
http://bpmn.io/.

References

H2020 programme guidelines on fair data management in horizon 2020. EC Directorate General for Research and Innovation (2016)
Google Scholar
Bankier, J.G.: Institutional repository software comparison. UNESCO (2014)
Google Scholar
Castagne, M.: Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra. University of British Columbia (2013)
Google Scholar
Chen, X., et al.: CERN analysis preservation: a novel digital library service to enable reusable and reproducible research. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds.) TPDL 2016. LNCS, vol. 9819, pp. 347–356. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43997-6_27
Chapter Google Scholar
European Commission: European Open Science Cloud Declaration (2017)
Google Scholar
DCC: Checklist for a Data Management Plan. v. 4.0. Edinburgh: Digital Curation Centre (2013). http://www.dcc.ac.uk/resources/data-management-plans. Accessed 29 Mar 2018
Darema, F.: Dynamic data driven applications systems: a new paradigm for application simulations and measurements. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3038, pp. 662–669. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24688-6_86
Chapter Google Scholar
Hey, T., Tansley, S., Tolle, K.: The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond (2009)
Google Scholar
Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data management and stewardship. Nature Sci. Data 3 (2016). Article no. 160018
Google Scholar
Miksa, T., Rauber, A., Mina, E.: Identifying impact of software dependencies on replicability of biomedical workflows. J. Biomed. Inform. 64(C), 232–254 (2016)
Article Google Scholar
Otto, B.: Data governance. Microsoft Res. 4, 241–246 (2011)
Google Scholar
Proell, S., Meixner, K., Rauber, A.: Precise data identification services for long tail research data. In: iPRES 2016 (2016)
Google Scholar
Rauber, A., Miksa, T., Ganguly, R., Budroni, P.: Information integration for machine actionable data management plans. In: IDCC 2017 (2017)
Google Scholar
Rosa, C.A., Craveiro, O., Domingues, P.: Open source software for digital preservation repositories: a survey. Int. J. Comput. Sci. Eng. Surv. (IJCSES) 8(3), 21–39 (2017)
Article Google Scholar
Schembera, B., Bönisch, T.: Challenges of research data management for high performance computing. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 140–151. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_12
Chapter Google Scholar
Simms, S., Jones, S., Mietchen, D., Miksa, T.: Machine-actionable data management plans. Res. Ideas Outcomes 3, e13086 (2017)
Article Google Scholar

Download references

Acknowledgments

This research was carried out in the context of the Austrian COMET K1 program and publicly funded by the Austrian Research Promotion Agency (FFG) and the Vienna Business Agency (WAW).

Author information

Authors and Affiliations

SBA Research and TU Wien, Favoritenstrasse 16, 1040, Wien, Austria
Asztrik Bakos, Tomasz Miksa & Andreas Rauber

Authors

Asztrik Bakos
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Miksa
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Rauber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomasz Miksa .

Editor information

Editors and Affiliations

University Carlos III, Madrid, Spain
Eva Méndez
USI, Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
Cristina Ribeiro
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
Gabriel David
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
João Correia Lopes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bakos, A., Miksa, T., Rauber, A. (2018). Research Data Preservation Using Process Engines and Machine-Actionable Data Management Plans. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-00066-0_6
Published: 05 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00065-3
Online ISBN: 978-3-030-00066-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics