Skip to main content

Research Data Preservation Using Process Engines and Machine-Actionable Data Management Plans

  • Conference paper
  • First Online:
Digital Libraries for Open Knowledge (TPDL 2018)

Abstract

Scientific experiments in various domains require nowadays collecting, processing, and reusing data. Researchers have to comply with funder policies that prescribe how data should be managed, shared and preserved. In most cases this has to be documented in data management plans. When data is selected and moved into a repository when project ends, it is often hard for researchers to identify which files need to be preserved and where they are located. For this reason, we need a mechanism that allows researchers to integrate preservation functionality into their daily workflows of data management to avoid situations in which scientific data is not properly preserved.

In this paper we demonstrate how systems used for managing data during research can be extended with preservation functions using process engines that run pre-defined preservation workflows. We also show a prototype of a machine-actionable data management plan that is automatically generated during this process to document actions performed. Thus, we break the traditional distinction between platforms for managing data during research and repositories used for preservation afterwards. Furthermore, we show how researchers can easier comply with funder requirements while reducing their effort.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.rd-alliance.org/groups/dmp-common-standards-wg.

  2. 2.

    https://osf.io/.

  3. 3.

    https://www.alfresco.com/.

  4. 4.

    http://www.omg.org/bpmn/.

  5. 5.

    IBM Business Process Manager: https://www.ibm.com/us-en/marketplace/business-process-manager.

  6. 6.

    A JSON format supporting the use of ontologies: https://json-ld.org/.

  7. 7.

    http://bpmn.io/.

References

  1. H2020 programme guidelines on fair data management in horizon 2020. EC Directorate General for Research and Innovation (2016)

    Google Scholar 

  2. Bankier, J.G.: Institutional repository software comparison. UNESCO (2014)

    Google Scholar 

  3. Castagne, M.: Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra. University of British Columbia (2013)

    Google Scholar 

  4. Chen, X., et al.: CERN analysis preservation: a novel digital library service to enable reusable and reproducible research. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds.) TPDL 2016. LNCS, vol. 9819, pp. 347–356. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43997-6_27

    Chapter  Google Scholar 

  5. European Commission: European Open Science Cloud Declaration (2017)

    Google Scholar 

  6. DCC: Checklist for a Data Management Plan. v. 4.0. Edinburgh: Digital Curation Centre (2013). http://www.dcc.ac.uk/resources/data-management-plans. Accessed 29 Mar 2018

  7. Darema, F.: Dynamic data driven applications systems: a new paradigm for application simulations and measurements. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3038, pp. 662–669. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24688-6_86

    Chapter  Google Scholar 

  8. Hey, T., Tansley, S., Tolle, K.: The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond (2009)

    Google Scholar 

  9. Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data management and stewardship. Nature Sci. Data 3 (2016). Article no. 160018

    Google Scholar 

  10. Miksa, T., Rauber, A., Mina, E.: Identifying impact of software dependencies on replicability of biomedical workflows. J. Biomed. Inform. 64(C), 232–254 (2016)

    Article  Google Scholar 

  11. Otto, B.: Data governance. Microsoft Res. 4, 241–246 (2011)

    Google Scholar 

  12. Proell, S., Meixner, K., Rauber, A.: Precise data identification services for long tail research data. In: iPRES 2016 (2016)

    Google Scholar 

  13. Rauber, A., Miksa, T., Ganguly, R., Budroni, P.: Information integration for machine actionable data management plans. In: IDCC 2017 (2017)

    Google Scholar 

  14. Rosa, C.A., Craveiro, O., Domingues, P.: Open source software for digital preservation repositories: a survey. Int. J. Comput. Sci. Eng. Surv. (IJCSES) 8(3), 21–39 (2017)

    Article  Google Scholar 

  15. Schembera, B., Bönisch, T.: Challenges of research data management for high performance computing. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 140–151. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_12

    Chapter  Google Scholar 

  16. Simms, S., Jones, S., Mietchen, D., Miksa, T.: Machine-actionable data management plans. Res. Ideas Outcomes 3, e13086 (2017)

    Article  Google Scholar 

Download references

Acknowledgments

This research was carried out in the context of the Austrian COMET K1 program and publicly funded by the Austrian Research Promotion Agency (FFG) and the Vienna Business Agency (WAW).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomasz Miksa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bakos, A., Miksa, T., Rauber, A. (2018). Research Data Preservation Using Process Engines and Machine-Actionable Data Management Plans. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00066-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00065-3

  • Online ISBN: 978-3-030-00066-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics