Skip to main content

Chimera: A Bridge Between Big Data Analytics and Semantic Technologies

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2021 (ISWC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12922))

Included in the following conference series:

Abstract

In the last decades, Knowledge Graph (KG) empowered analytics have been used to extract advanced insights from data. Several companies integrated legacy relational databases with semantic technologies using Ontology-Based Data Access (OBDA). In practice, this approach enables the analysts to write SPARQL queries both over KGs and SQL relational data sources by making transparent most of the implementation details. However, the volume of data is continuously increasing, and a growing number of companies are adopting distributed storage platforms and distributed computing engines. There is a gap between big data and semantic technologies. Ontop, one of the reference OBDA systems, is limited to legacy relational databases, and the compatibility with the big data analytics engine Apache Spark is still missing. This paper introduces Chimera, an open-source software suite that aims at filling such a gap. Chimera enables a new type of round-tripping data science pipelines. Data Scientists can query data stored in a data lake using SPARQL through Ontop and SparkSQL while saving the semantic results of such analysis back in the data lake. This new type of pipelines semantically enriches data from Spark before saving them back.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Ontop project: https://ontop-vkg.org, Ontopic company: https://ontopic.biz/.

  2. 2.

    https://chimera-suite.github.io/.

  3. 3.

    https://www.w3.org/TR/sparql11-federated-query/.

  4. 4.

    https://github.com/chimera-suite/infrastructure.

  5. 5.

    Chimera supports several Spark versions, starting from 2.4.0 to 3.1.1. Users can change the version by selecting the appropriate image tags.

  6. 6.

    https://parquet.apache.org/.

  7. 7.

    https://hadoop.apache.org/.

  8. 8.

    https://spark.apache.org/sql/.

  9. 9.

    https://jena.apache.org/documentation/fuseki2/.

  10. 10.

    https://repo1.maven.org/maven2/org/apache/hive/hive-jdbc/.

  11. 11.

    https://www.w3.org/2001/sw/rdb2rdf/wiki.

  12. 12.

    https://ontop-vkg.org/dev/db-adapter.html#required-implementations.

  13. 13.

    https://hub.docker.com/r/chimerasuite/ontop.

  14. 14.

    https://github.com/chimera-suite/OntopSpark.

  15. 15.

    https://spark.apache.org/docs/3.0.1/api/python/.

  16. 16.

    https://sparqlwrapper.readthedocs.io/.

  17. 17.

    https://rdflib.readthedocs.io/.

  18. 18.

    https://pypi.org/project/PySPARQL/.

  19. 19.

    https://github.com/chimera-suite/PySPARQL.

  20. 20.

    https://hub.docker.com/r/chimerasuite/jupyter-notebook.

  21. 21.

    https://protege.stanford.edu/ontologies/pizza/pizza.owl.

  22. 22.

    https://github.com/chimera-suite/use-case.

  23. 23.

    http://www.rse-web.it/.

  24. 24.

    https://www.unareti.it/.

  25. 25.

    https://databricks.com/.

  26. 26.

    https://github.com/chimera-suite/OntopSpark-evaluation.

  27. 27.

    https://github.com/SANSA-Stack/SANSA-Stack/tree/develop/sansa-query/sansa-query-spark/src/test/resources/datalake.

  28. 28.

    http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/.

  29. 29.

    https://chimera-suite.github.io/.

  30. 30.

    https://github.com/chimera-suite.

  31. 31.

    https://github.com/ontop/ontop/pull/422.

  32. 32.

    https://hub.docker.com/u/chimerasuite.

References

  1. Bionda, E., et al.: The smart grid semantic platform: synergy between iec common information model (cim) and big data. In: 2019 IEEE International Conference on Environment and Electrical Engineering and 2019 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe). IEEE (2019)

    Google Scholar 

  2. Calvanese, D., et al.: OBDA with the ontop framework. In: SEBD, pp. 296–303. Curran Associates, Inc. (2015)

    Google Scholar 

  3. Calvanese, D., et al.: Ontop: answering SPARQL queries over relational databases. Semant. Web 8(3), 471–487 (2017)

    Article  Google Scholar 

  4. Calvanese, D., et al.: The MASTRO system for ontology-based data access. Semant. Web 2(1), 43–53 (2011)

    Article  Google Scholar 

  5. Chronis, Y., et al.: A relational approach to complex dataflows. In: EDBT/ICDT Workshops. CEUR Workshop Proceedings, vol. 1558. CEUR-WS.org (2016)

    Google Scholar 

  6. Giese, M., et al.: Optique: zooming in on big data. Computer 48(3), 60–67 (2015)

    Article  Google Scholar 

  7. Graux, D., Jachiet, L., Genevès, P., Layaïda, N.: SPARQLGX: efficient distributed evaluation of SPARQL with apache spark. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 80–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_9

    Chapter  Google Scholar 

  8. Kharlamov, E., et al.: Ontology based data access in statoil. J. Web Semant. 44, 3–36 (2017)

    Article  Google Scholar 

  9. Kharlamov, E., et al.: Semantic access to streaming and static data at siemens. J. Web Semant. 44, 54–74 (2017)

    Article  Google Scholar 

  10. Lehmann, J., et al.: Distributed semantic analytics using the SANSA stack. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10588, pp. 147–155. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_15

    Chapter  Google Scholar 

  11. Mami, M.N., Graux, D., Scerri, S., Jabeen, H., Auer, S., Lehmann, J.: Squerall: virtual ontology-based access to heterogeneous and large data sources. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 229–245. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_15

    Chapter  Google Scholar 

  12. Noy, N.F., McGuinness, D.L., et al.: Ontology development 101: A guide to creating your first ontology (2001)

    Google Scholar 

  13. Priyatna, F., Corcho, Ó., Sequeda, J.F.: Formalisation and experiences of r2rml-based SPARQL to SQL query translation using morph. In: WWW, pp. 479–490. ACM (2014)

    Google Scholar 

  14. Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: PSI EtA, p. 4. ACM (2010)

    Google Scholar 

  15. Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: Pigsparql: mapping SPARQL to pig latin. In: SWIM, p. 4. ACM (2011)

    Google Scholar 

  16. Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. Proc. VLDB Endow. 9(10), 804–815 (2016)

    Article  Google Scholar 

  17. Sequeda, J.F., Miranker, D.P.: Ultrawrap: SPARQL execution on relational data. J. Web Semant. 22, 19–39 (2013)

    Article  Google Scholar 

  18. Suárez-Figueroa, M.C., Gómez-Pérez, A., Motta, E., Gangemi, A. (eds.): Ontology Engineering in a Networked World. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24794-1

    Book  Google Scholar 

  19. Uslar, M., Specht, M., Rohjans, S., Trefke, J., González, J.M.: The Common Information Model CIM: IEC 61968/61970 and 62325-A practical introduction to the CIM. Springer Science & Business Media (2012)

    Google Scholar 

  20. Xiao, G., Calvanese, D., Kontchakov, R., Lembo, D., Poggi, A., Rosati, R., Zakharyaschev, M.: Ontology-based data access: a survey. In: IJCAI, pp. 5511–5519. ijcai.org (2018)

    Google Scholar 

  21. Xiao, G., Ding, L., Cogrel, B., Calvanese, D.: Virtual knowledge graphs: an overview of systems and use cases. Data Intell. 1(3), 201–223 (2019)

    Article  Google Scholar 

  22. Yu, H., Liaw, S., Taggart, J., Khorzoughi, A.R.: Using ontologies to identify patients with diabetes in electronic health records. In: International Semantic Web Conference (Posters & Demos). CEUR Workshop Proceedings, vol. 1035, pp. 77–80. CEUR-WS.org (2013)

    Google Scholar 

  23. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

This work has been partially financed by the Research Fund for the Italian Electrical System in compliance with the Decree of Minister of Economical Development April 16, 2018. We also thank Marco Balduini for initiating Chimera.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Matteo Belcao , Emanuele Falzone , Enea Bionda or Emanuele Della Valle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belcao, M., Falzone, E., Bionda, E., Valle, E.D. (2021). Chimera: A Bridge Between Big Data Analytics and Semantic Technologies. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88361-4_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88360-7

  • Online ISBN: 978-3-030-88361-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics