Abstract
Different scientific communities produce different kinds of datasets that rely on different data descriptions, approaches, and logical organisations. In such an environment, it is essential to establish a knowledge communication framework that can guarantee some fundamentals, such as an inclusive description and documentation of the interdisciplinary digital resources, their long-term preservation, access, use, and reuse. The establishment of semantic knowledge integration aims at overcoming such inhomogeneity between data produced by different research communities. Specifically, we refer to those communities aggregated within the e-Infrastructure developed by the European project VI-SEEM: Life Science, Climate Science, and Digital Cultural Heritage. The current research proposes a framework based on CIDOC CRM and its extensions, in particular the CRMsci and CRMdig, and tested on examples identified as interdisciplinary respect to the different and various research areas of the project. Moreover, the semantic solution aims at fulfilling the FAIR principles.
Similar content being viewed by others
Notes
The original draft about the FAIR Guiding Principles can be found at https://www.force11.org/group/fairgroup/fairprinciples.
Beyond the fact that several extra-European countries are involved in European Research Infrastructures projects (see the data as of November 2018 about International Cooperation among Environmental Research Infrastructures in Horizon 2020: http://www.coop-plus.eu/), it worth to cite the initiatives brought forward by Governments and Science Foundations in some of those countries. For example, the US National Science Foundation, which has a mission to make advanced equipment, facilities, and shared cyber-infrastructures broadly available to the entire research community, supports several research infrastructures on different subjects (https://www.nsf.gov/news/nsf09013/index.jsp); in South America, the Latin American Strategy Forum for Large-Scale Research Infrastructures (https://sites.google.com/view/lastrategyforum/home) is working at the establishment of research infrastructures, especially for Particle Physics and Cosmology; in Australia, the National Collaborative Research Infrastructure Strategy (NCRIS) (https://docs.education.gov.au/node/43936) of the Ministry for Education invests in eResearch Infrastructures, such as the Data LifeCycle Framework (DLCF), a nationwide strategy to connect research resources and activities to improve data discovery, storage and reuse (https://www.dlc.edu.au/about).
The Open Biomedical and Biological Ontologies Foundry. http://obofoundry.org [18].
Basic Formal Ontology 2.0: Draft Specification and User’s Guide. http://bfo.googlecode.com/svn/trunk/docs/bfo2-reference/BFO2-Reference.docx.
The survey consisted of various questions (16) to investigate the solutions adopted by each institution, in the three fields: e.g. Quantity and type of data and which scientific domain; metadata formats/schemas/ontologies and controlled vocabularies used, use of PIDs, subjects and languages, data storage service, etc.
The VI-SEEM repository is built on DSpace, an open-source repository software package typically used for creating open access repositories for scholarly and/or published digital content (https://github.com/DSpace/DSpace).
The problem is already treated in [14] regarding the use of the EML—Ecological Metadata Language schema and Darwin Core for data documentation in the ecological field.
CRMsci is a Scientific Observation Model, a formal ontology extension of the CIDOC CRM. It is intended to be used as a global schema for integrating metadata about scientific observation, measurements and processed data in descriptive and empirical sciences such as biodiversity, geology, geography, archaeology, cultural heritage conservation and so forth in research IT environment and research data libraries http://cidoc-crm.org/crmsci/.
CRMdig is a compatible extension of CIDOC CRM to encode metadata about the steps and methods of production (‘provenance’) of digitization products and digital representations such as 2D and 3D created by various technologies.
References
Alipova, K.A., Bart, A.A., Fazliev, A.Z., Gordov, E.P., Okladnikov, I.G., Privezentsev, A.I., Titov, A.G.: Systematization of climate data in the virtual research environment on the basis of ontology approach. In: Proceedings of the SPIE 10466, 23rd International Symposium on Atmospheric and Ocean Optics: Atmospheric Physics (2017). https://doi.org/10.1117/12.2289761
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29 (2000). https://doi.org/10.1038/75556
Bart, A.A., Churuksaeva, V.V., Fazliev A.Z., Privezentsev, A.I., Gordov, E.P., Okladnikov, I.G., Titov, A.G.: Ontological description of meteorological and climate data collections. In: Proceedings of the XIX International Conference “Data Analytics and Management in Data Intensive Domains” (DAMDID/RCDL’2017), Moscow, Russia, October 10–13, 2017 (2017). http://ceur-ws.org/Vol-2022/paper43.pdf
Boeckhout, M., Zielhuis, G.A., Bredenoord, A.L.: The FAIR guiding principles for data stewardship: fair enough? Eur. J. Hum. Genet. 26(7), 931–936 (2018)
Buttigieg, P.L., Morrison, N., Smith, B., Mungall, C.J., Lewis, S.E.: The environment ontology: contextualising biological and biomedical entities. J. Biomed. Semant. 4, 43 (2013). https://doi.org/10.1186/2041-1480-4-43
Buttigieg, P.L., Pafilis, E., Lewis, S.E., Schildhauer, M.P., Walls, R.L., Mungall, C.J.: The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation. J. Biomed. Semant. 7, 57 (2016). https://doi.org/10.1186/s13326-016-0097-6
de Jong, F., Maegaard, B., De Smedt, K., Fišer, D., Van Uytvanck, D.: CLARIN: towards FAIR and responsible data science using language resources. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 3259–3264 (2018)
Diepenbroek, M., Grobe, H., Reinke, M., Schindler, U., Schlitzer, R., Sieger, R., Wefer, G.: PANGAEA—an information system for environmental sciences. Comput. Geosci. 28(10), 1201–1210 (2002). https://doi.org/10.1016/S0098-3004(02)00039-0
Diepenbroek, M., Schindler, U., Huber, R., Pesant, S., Stocker, M., Felden, J., Buss, M., Weinrebe, M.: Terminology supported archiving and publication of environmental science data in PANGAEA. J. Biotechnol. 261, 177–186 (2017)
Gregory, J.: The CF metadata standard, pp. 1–5 (2003). http://cfconventions.org/Data/cf-documents/overview/article.pdf. Accessed Jan 2020
Hinrichs, E., Krauwer, S.: The CLARIN research infrastructure: resources and tools for E-humanities scholars. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), pp. 1525–1531 (2014)
Holliday, T.V.: Soils in Archaeological Research. Oxford University Press, New York (2004)
Jeffery, K.: The CERIF model as the core of a research organization. Data Sci. J. 9, 7–13 (2010)
Madin, J., Bowers, S., Shildhauer, M., Krivov, S., Pennington, D., Villa, F.: An Ontology for describing and synthesizing ecological observation data. Ecol. Inform. 2(3), 279–296 (2007)
Mattern, S.: The big data of ice, rocks, soils, and sediments. Places J. 1, 1 (2017). https://doi.org/10.22269/171107
Rosati, I., Bergami, C., Fiore, N., Oggioni, A., Tagliolato, P.: LifeWatch Italy Thesauri Documentation. CNR Edizioni, Rome (2017). ISBN 9788880802495
Schentz, H., Peterseil, J., Mirtlh, M.: Semantics for the ‘Long Term Ecological Researchers’, Share-PSI 2.0. Berlin/Papers, Papers/abstracts for plenary presentations, P6 (2015)
Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L.J., Eilbeck, K., Ireland, A., Mungall, C.J., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S.A., Scheuermann, R.H., Shah, N., Whetzel, P.L., Lewis, S.: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25, 1251–1255 (2007). https://doi.org/10.1038/nbt134
Smith, B. (ed.): Proceedings of the International Conference on Biomedical Ontology-ICBO (2009)
van der Werf, D.C., Adamescu, M., Ayromlou, M., Bertrand, N., Borovec, J., Boussard, H., Cazacu, C., van Daele, T., Datcu, S., Frenzel, M., Hammen, V., Karasti, H., Kertesz, M., Kuitunen, P., Lane, M., Lieskovsky, J., Magagna, B., Peterseil, J., Rennie, S., Schentz, H., Schleidt, K., Tuominen, L.: SERONTO: a socio-ecological research and observation ontology. In: Proceedings of TDWG 2008, Fremantle, Australia, 17–25 October 2008 (2008)
Wieczorek, J., Bloom, D., Guralnick, R., Blum, S., Döring, M., De Giovanni, R., Robertson, T., Vieglais, D.: Darwin Core: an evolving community-developed biodiversity data standard. PLoS ONE 7, 1 (2012)
Wilkinson, M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W., Bonino da Silva Santos, L., Bourne, P.E., Bouwman, J., Brookes, J. A., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J.G., Groth, P., Goble, C., Grethe, J.S., Heringa, J., Ct Hoen, P.A., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., Mons, B.: The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Acknowledgements
This work has been developed within the VI-SEEM, Virtual Research Environment (VRE) in Southeast Europe and the Eastern Mediterranean (SEEM), supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 675121. The authors would like to thank Dr. Andreas Athenodorou and Dr. Georgios Artopoulos (the Cyprus Institute) for the opportunity they offered to work on the topic. The authors express their gratitude to the anonymous reviewers for their helpful and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Vassallo, V., Felicetti, A. Towards an ontological cross-disciplinary solution for multidisciplinary data: VI-SEEM data management and the FAIR principles. Int J Digit Libr 22, 297–307 (2021). https://doi.org/10.1007/s00799-020-00285-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-020-00285-5