Skip to main content

Challenges of Research Data Management for High Performance Computing

  • Conference paper
  • First Online:
Book cover Research and Advanced Technology for Digital Libraries (TPDL 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10450))

Included in the following conference series:

Abstract

This paper targets the challenges of research data management with a focus on High Performance Computing (HPC) and simulation data. Main challenges are discussed: The Big Data qualities of HPC research data, technical data management, organizational and administrative challenges. Emerging from these challenges, requirements for a feasible HPC research data management are derived and an alternative data life cycle is proposed. The requirement analysis includes recommendations which are based on a modified OAIS architecture: To meet the HPC requirements of a scalable system, metadata and data must not be stored together. Metadata keys are defined and organizational actions are recommended. Moreover, this paper contributes by introducing the role of a Scientific Data Manager, who is responsible for the institution’s data management and taking stewardship of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For example, the data produced by the CERN/LHC experiments is distributed to data centers all over Europe. See: https://home.cern/about/computing/worldwide-lhc-computing-grid, last accessed Nov 28th 2016.

  2. 2.

    http://handle.net/, last accessed Nov 26th 2016.

  3. 3.

    http://www.pidconsortium.eu/, last accessed Nov 26th 2016.

  4. 4.

    There are online tools available for specifying a DMP, such as: https://dmponline.dcc.ac.uk/, last accessed Nov 25th, 2016.

References

  1. Arora, R.: Data management: state-of-the-practice at open-science data centers. In: Khan, S.U., Zomaya, A.Y. (eds.) Handbook on Data Centers, pp. 1095–1108. Springer, New York (2015). doi:10.1007/978-1-4939-2092-1_37

    Google Scholar 

  2. Askhoj, J., Sugimoto, S., Nagamori, M.: Preserving records in the cloud. Rec. Manage. J. 21(3), 175–187 (2011). https://doi.org/10.1108/09565691111186858

    Google Scholar 

  3. Cox, A.M., Pinfield, S.: Research data management and libraries: current activities and future priorities. J. Librarian. Inf. Sci. 46(4), 299–316 (2014). http://dx.doi.org/10.1177/0961000613492542

    Article  Google Scholar 

  4. DataCite: (2016). http://schema.datacite.org/. Accessed 6 Dec 2016

  5. DFG: Safeguarding good scientific practice (2013). http://www.dfg.de/download/pdf/dfg_im_profil/reden_stellungnahmen/download/empfehlung_wiss_praxis_1310.pdf. Accessed 6 Dec 2016

  6. EU: H2020 programme guidelines on FAIR data management in Horizon 2020 (2016). http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Accessed 6 Dec 2016

  7. EU: European Cloud Initiative - Building a competitive data and knowledge economy in Europe (2016). http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=15266. Accessed 6 Dec 2016

  8. Faulhaber, P.: Investing in the future of tape technology. Presentation, HPSS User Forum, New York City (2015)

    Google Scholar 

  9. Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. SIGMOD Rec. 34(4), 34–41 (2005). http://doi.acm.org/10.1145/1107499.1107503

    Article  Google Scholar 

  10. Heidorn, P.B.: Shedding light on the dark data in the long tail of science. Libr. Trends 57(2), 280–299 (2008). http://doi.org/10.1353/lib.0.0036

    Article  Google Scholar 

  11. Helly, J., Staudigel, H., Koppers, A.: Scalable models of data sharing in earth sciences. Geochem. Geophy. Geosyst. 4(1) (2003). http://dx.doi.org/10.1029/2002GC000318

  12. Hick, J.: HPSS in the Extreme Scale Era: Report to DOE Office of Science on HPSS in 2018–2022. Lawrence Berkeley National Laboratory (2010)

    Google Scholar 

  13. Hick, J.: The Fifth Workshop on HPC best practices: File systems and archives. Lawrence Berkeley National Laboratory. LBNL Paper LBNL-5262E (2013)

    Google Scholar 

  14. Jensen, U.: Datenmanagementpläne. In: Büttner, S., Hobohm, H.-C., Müller, L. (eds.) Handbuch Forschungsdatenmanagement. Bad Honnef: Bock u. Herchen (2011)

    Google Scholar 

  15. Jones, S.N., Strong, C.R., Parker-Wood, A., Holloway, A., Long, D.D.E.: Easing the burdens of HPC file management. In: Proceedings of the Sixth Workshop on Parallel Data Storage, PDSW 2011, NY, USA, pp. 25–30 (2011). http://doi.acm.org/10.1145/2159352.2159359

  16. Lautenschlager, M., Toussaint, F., Thiemann, H., Reinke, M.: The CERA-2 data model (1998). https://www.pik-potsdam.de/cera/Descriptions/Publications/Papers/9807_DKRZ_TechRep.15/cera2.pdf

  17. Liang, S., Holmes, V., Antoniou, G., Higgins, J.: iCurate: a research data management system. In: Bikakis, A., Zheng, X. (eds.) MIWAI 2015. LNCS, vol. 9426, pp. 39–47. Springer, Cham (2015). doi:10.1007/978-3-319-26181-2_4

    Google Scholar 

  18. Malik, T.: Geobase: indexing NetCDF files for large-scale data analysis. In: Big Data Management, Technologies, and Applications, pp. 295–313. IGI Global (2014). http://doi.org/10.4018/978-1-4666-4699-5.ch012

  19. Mattmann, C.A.: Computing: a vision for data science. Nature 493(7433), 473–475 (2013). http://dx.doi.org/10.1038/493473a

    Article  Google Scholar 

  20. NSF: Grant proposal guide chapter ii.c.2.j (2014). https://www.nsf.gov/pubs/policydocs/pappguide/nsf15001/gpg_2.jsp#dmp. Accessed 6 Dec 2016

  21. OAIS: Reference model for an Open Archival Information System. Technical report, CCSDS 650.0-M-2 (Magenta Book) Issue 2 (2012)

    Google Scholar 

  22. Parker-Wood, A., Long, D.D.E., Madden, B.A., Adams, I.F., McThrow, M., Wildani, A.: Examining extended and scientific metadata for scalable index designs. In: Proceedings of the 6th International Systems and Storage Conference, SYSTOR 2013, NY, USA, pp. 4:1–4:6 (2013). http://doi.acm.org/10.1145/2485732.2485754

  23. Potthoff, J., van Wezel, J., Razum, M., Walk, M.: Anforderungen eines nachhaltigen, disziplinübergreifenden Forschungsdaten-Repositoriums. In: DFN-Forum Kommunikationstechnologien, pp. 11–20 (2014)

    Google Scholar 

Download references

Acknowledgments

We would like to thank Wanda Spahn for proofreading.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Björn Schembera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Schembera, B., Bönisch, T. (2017). Challenges of Research Data Management for High Performance Computing. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67008-9_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67007-2

  • Online ISBN: 978-3-319-67008-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics