Skip to main content
Log in

A semantically enabled metadata repository for scientific data

  • Methodology Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

The LASP Extended Metadata Repository (LEMR) is a semantically enabled repository of information (metadata) about the scientific datasets that LASP offers to the public. The repository enables the provision of consistent, current, verified metadata to our users. It serves as a Single Source of Truth for this information, enabling more rigorous metadata management and addressing problems related to duplication of information. The linked open data aspect of the repository allows interlinking of concepts both within and across organizations and web sites. Associated interfaces allow users to browse and search the metadata. This information can be dynamically incorporated into web pages, so web page content is always up-to-date and consistent across the lab. With this information we can generate metadata records in a variety of schemas, such as ISO or SPASE, allowing federation with other organizations interested in our data. We leveraged open source technologies to build the repository and the dynamic web pages that read from it. VIVO, an open source semantic web application, provided key capabilities such as ontology and triple store management interfaces. AngularJS, an open source JavaScript framework for building web dynamic applications, was also invaluable in developing web pages that provide semantically enabled public interfaces to the metadata. In this paper we discuss our use of these tools and what we had to craft in order to meet our lab-specific needs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. LISIRD, http://lasp.colorado.edu/lisird/.

  2. Fuseki is a server that provides the SPARQL protocol over HTTP. See http://jena.apache.org/documentation/serving_data/.

  3. SPARQL (SPARQL), a recursive acronym for ‘SPARQL Protocol and RDF Query Language’, is a query language for databases that allows direct and rapid querying of a semantic database, and here independent of the VIVO interface. See SPARQL, http://en.wikipedia.org/wiki/SPARQL.

  4. VIVO provides ‘facets’ based on types of concepts, i.e. a ‘Person’ would have the facets ‘Faculty Member’, or ‘Student’, with ‘Student’ having further facets ‘Undergraduate Student’ or ‘Graduate Student’, etc. Users can drill down through more detailed results based on the selected sub-types / facets. See http://en.wikipedia.org/wiki/Faceted_search for more information on faceted search.

  5. See https://wiki.duraspace.org/display/VIVOSearch/VIVO+Multisite+Search.

  6. We currently implement the default VIVO database, SDB, an Apache version of an RDF triplestore based on a MySQL database. For more information on semantic triplestores, see http://en.wikipedia.org/wiki/Triplestore.

  7. The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. For more information, see http://en.wikipedia.org/wiki/Resource_Description_Framework.

  8. In computer and information science, an ontology is a machine encoding of terms, concepts, and the relations among them. See http://semanticweb.org/wiki/Ontology for more information.

  9. FOAF, http://xmlns.com/foaf/spec/.

  10. Bibliontology, http://bibliontology.com/.

  11. SKOS, http://www.w3.org/2009/08/skos-reference/skos.html.

  12. VCard, http://www.w3.org/TR/vcard-rdf/.

  13. See the SWEET ontology homepage, http://sweet.jpl.nasa.gov/.

  14. DCAT, http://www.w3.org/TR/vocab-dcat/.

  15. VIVO Harvester, https://wiki.duraspace.org/display/VIVO/VIVO+Harvester.

  16. New versions of VIVO are planned for releases that have increased functionality such as a built-in SPARQL endpoints and a SPARQL-based update API. We will evaluate and adopt these capabilities as appropriate.

  17. AngularJS controllers model the ‘C’ in the MVC, or Model, View, Controller software design pattern.

  18. JSON is a lightweight data interchange format. See http://json.org.

  19. From a software development perspective, AngularJS is powerful also due to complementary tools for creating project skeletons (Yeoman) and executing both unit and end-to-end testing, such as Grunt (Grunt) and Karma (Karma), and managing dependencies (Bower). There are many on line resources, such as, http://www.sitepoint.com/kickstart-your-angularjs-development-with-yeoman-grunt-and-bower/.

  20. ISO 19115, http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=53798.

  21. LaTiS, https://github.com/dlindhol/LaTiS/wiki.

  22. The NeOn project has identified six categories of ODPs: Structural, Correspondence, Content, Reasoning, Presentation, and Lexico-Syntactic. The NeOn Project aims to “advance the state of the art in using ontologies for large-scale semantic applications”. See http://ontologydesignpatterns.org/wiki/Main_Page for information about ODPs, and http://www.neon-project.org/nw/Welcome_to_the_NeOn_Project for information about NeOn.

  23. http://esipportal.cse.sc.edu:8084/ontologies.

  24. ORCID, http://orcid.org/.

References

  • Arko R (2014) EarthCube Building Blocks: OceanLink, Leveraging Semantics and Linked Data for Geoscience Data Sharing and Discovery.” Presented at Open Geospatial Consortium’s Technical and Planning Committee Meeting, Crystal City, VA, March, http://www.oceanlink.org/papers/OceanLink-Arko.pdf

  • Fox P, McGuinness DL, Cinquini L, West P, Garcia J, Benedict J, Middleton D (2009) Ontology-supported scientific data frameworks: the virtual solar terrestrial observatory experience. Comput Geosci 35(4):724–738

    Article  Google Scholar 

  • Gewin V (2009) Networking in VIVO, An interdisciplinary networking site for scientists. Nature 462, 123 (4 November 2009) | 10.1038/nj7269-123a

  • Green Brad, Seshadri Shyam (2013) AngularJS (1st ed.). O’Reilly Media. p. 150. (March 22, 2013) ISBN 978–1449344856.

  • Hitzler P (2014) “Ontology Design Patterns for Large-Scale Data Interchange and Discovery.” Conference Keynote at EKAW 2014, the 29th International Conference on Knowledge Engineering and Knowledge Management, Linkoping, Sweden, November

  • Klein M., Fensel D (2001) “Ontology versioning on the Semantic Web.” SWWS. 2001

  • Kozlowski Pawel, Darwin Peter Bacon (2013) Mastering Web Application Development with AngularJS (1st ed.). Packt Publishing. p. 372. (August 23, 2013) ISBN 978–1782161820.

  • Krafft, D., Cappadona, A., Caruso, B., Corson-Rikert, J., Devare, M., Lowe, B., (2010) VIVO: Enabling National Networking of Scientists, Web Science Conf., April 26-27, 2010, Raleigh, NC

  • Lassila O., Swick R. R. (1998) Resource Description Framework (RDF) model and syntax specification.

  • McGuinness, D.L., P. Fox, L. Cinquini, P. West, J. Garcia, J.L. Benedict, D. Middleton (2007) The virtual solar-terrestrial observatory: a deployed Semantic Web application case study for scientific research. In the proceedings of the 19th Conference on Innovative Applications of Artificial Intelligence (IAAI). Vancouver, BC, Canada, July 2007, pp. 1730–1737 and AI magazine, 29, #1, pp. 65–76

  • Merka J., Narock T., Szabo A. (2008) Navigating through SPASE to heliospheric and magnetospheric data. Earth Science Informatics 1 (1) (April, 2008) doi:10.1007/s12145-008-0004-5

  • Narock T., Fox P. (2012) From Science to e-Science to Semantic e-Science: A Heliophysics case study. J Comput Geosci, Vol 46, September, 2012, Pages 248–254, doi: 10.1016/j.cageo.2011.11.018

  • Narock T., King T. (2008) Developing a SPASE query language. Earth Science Informatics 1 (1). doi:10.1007/s12145-008-0007-2 (April, 2008).

  • Narock TW, Szabo A, Merka J (2009) Using semantics to extend the space physics data environment. Comput Geosci 35(4):791–797

    Article  Google Scholar 

  • Raskin RG, Pan MJ (2005) Knowledge representation in the semantic web for earth and environmental terminology (SWEET). Comput Geosci 31(9):1119–1125

    Article  Google Scholar 

  • Wilson, A., Lindholm, D.M., Ware DeWolfe, A., Lindholm, C., Pankratz, C.K., Snow, D.M. Woods, T.N. (2009) LISIRD 2: Applying Standards and Open Source Software in Exploring and Serving Scientific Data, AGU 2009 IN41A-1122.

Download references

Acknowledgments

This work was supported with contributions from the NASA projects: Solar Radiation and Climate Experiment (SORCE), Multi-Satellite Ultraviolet Solar Spectral Irradiance Composite (MUSSIC), Total Solar Irradiance Center (TSIS), and Magnetospheric Multiscale (MMS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne Wilson.

Additional information

Communicated by: H. A. Babaie

Appendix

Appendix

A. Example SPARQL query generated for grabbing mission metadata (datasets omitted for simplicity):

figure a

The results of the above query, which can be returned in multiple formats including JSON:

figure b

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wilson, A., Cox, M., Elsborg, D. et al. A semantically enabled metadata repository for scientific data. Earth Sci Inform 8, 649–661 (2015). https://doi.org/10.1007/s12145-014-0175-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-014-0175-1

Keywords

Navigation