Skip to main content

Integration of Data on Substance Properties Using Big Data Technologies and Domain-Specific Ontologies

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 822))

Abstract

A new technology for storage and categorization of heterogeneous data on the properties of matter is proposed. Availability of a multitude of heterogeneous data from a variety of sources justifies the use of one of the popular toolkit for Big Data processing, Apache Spark. Its role in the proposed technology is to manage with extensive data warehouse in text files of the JSON format. The first stage of the technology involves the conversion of primary resources (relational databases, digital archives, Web-portals, etc.) to a standardized form of the JSON document. Advantages of JSON-format - the ability to store data and metadata within a text document, accessible perceptions of a person and a computer and support for the hierarchical structures needed to represent complex and irregular data structure. The presence of such data structures is associated with the possible expansion of the subject area: new types of materials, expansion of the nomenclature of properties, and so on. For the semantic integration of resources converted to the JSON format a repository of subject-oriented ontologies is used. The search for data in the JSON document store is implemented through a combination of SPARQL and SQL queries. The first one (addressed to the ontology repository) provide the user with the ability to view and search for adequate and related concepts. The second, accessing the JSON document sets, retrieves the required data from the document body using the capabilities of Apache Spark SQL. The efficiency of the developed technology is tested on the problems of thermophysical data integration with a characteristic for them complexity of the logical structure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Section “Databases at the Joint Institute for High Temperatures, Russian Academy of Sciences” on page 47 of the review [13].

References

  1. WhatIs.com (a reference and self-education tool about information technology). http://whatis/techtarget.com/definition/3Vs

  2. Erkimbaev, A.O., Zitserman, V.Y., Kobzev, G.A., Kosinov, A.V.: Standardization of Storage and Retrieval of Semi-structured Thermophysical Data in JSON-documents Associated with the Ontology. In: CEUR –WS 2022, urn: nbn:de:0074-2022-6 (2017). http://ceur-ws.org/Vol-2022/paper36.pdf

  3. Frenkel, M., Chirico, R.D., Diky, V., et al.: XML-based IUPAC standard for experimental, predicted, and critically evaluated thermodynamic property data storage and capture (ThermoML). Pure Appl. Chem. 78, 541–612 (2006). https://doi.org/10.1351/pac200678030541

    Article  Google Scholar 

  4. Sturrock, C.P., Begley, E.F., Kaufman, J.G.: NISTIR 6785. MatML – Materials Markup Language Workshop Report, U.S. Department of Commerce. National Institute of Standards and Technology (2001)

    Google Scholar 

  5. Introducing JSON. http://json.org/index.html

  6. Michel, K., Meredig, B.: Beyond bulk single crystals: A data format for all materials structure–property–processing relationships. MRS Bull. 41, pp. 617–623. https://doi.org/10.1557/mrs.2016.166

    Article  Google Scholar 

  7. Ontobee: A linked data server designed for ontologies. http://www.ontobee.org

  8. Erkimbaev, A.O., Zhizhchenko, A.B., Zitserman, V.Yu, Kobzev, G.A., Son, E.E., Sotnikov, A.N.: Integration of databases on substance properties: approaches and technologies. Autom. Documentation Math. Linguist. 46, 170–176 (2012). https://doi.org/10.3103/S000510551204005X

    Article  Google Scholar 

  9. Ataeva, O.M., Erkimbaev, A.O., Zitserman, V.Yu. et al.: Ontological Modeling as a Means of Integration Data on Substances Thermophysical Properties. In: 15th All-Russian Science Conference “Electronic Libraries: Advanced Approaches and Technologies, Electronic Collections”, s1_3. Yaroslavl (2013). http://rcdl.ru/doc/2013/paper/s1_3.pdf

  10. ChemSpider. http://www.chemspider.com

  11. Hall, S.R., McMahon, B.: The implementation and evolution of STAR/CIF ontologies: interoperability and preservation of structured data. Data Sci. J. 15(3), 1–15 (2016). https://doi.org/10.5334/dsj-2016-003

    Article  Google Scholar 

  12. Apache Spark. http://spark.apache.org

  13. Kiselyova, N.N., Dudarev, V.A., Zemskov, V.S.: Computer information resources of inorganic chemistry and materials science. Rus. Chem. Rev. 79, 145–166 (2010). https://doi.org/10.1070/RC2010v079n02ABEH004104

    Article  Google Scholar 

  14. Frenkel, M.: Global communications and expert systems in thermodynamics: Connecting property measurement and chemical process design. Pure Appl. Chem. 77, 1349–1367 (2005). https://doi.org/10.1351/pac200577081349

    Article  Google Scholar 

  15. Belov, G.V., Iorish, V.S., Yungman, V.S.: IVTANTHERMO for Windows-database on thermodynamic properties and related software. Calphad 23, 173–180 (1999). https://doi.org/10.1016/s0364-5916(99)00023-1

    Article  Google Scholar 

Download references

Acknowledgments

The work is supported by Russian Scientific Foundation, grant 14-50-00124.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adilbek Erkimbaev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Erkimbaev, A., Zitserman, V., Kobzev, G., Kosinov, A. (2018). Integration of Data on Substance Properties Using Big Data Technologies and Domain-Specific Ontologies. In: Kalinichenko, L., Manolopoulos, Y., Malkov, O., Skvortsov, N., Stupnikov, S., Sukhomlin, V. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2017. Communications in Computer and Information Science, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-319-96553-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96553-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96552-9

  • Online ISBN: 978-3-319-96553-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics