Abstract
In the last decades, Knowledge Graph (KG) empowered analytics have been used to extract advanced insights from data. Several companies integrated legacy relational databases with semantic technologies using Ontology-Based Data Access (OBDA). In practice, this approach enables the analysts to write SPARQL queries both over KGs and SQL relational data sources by making transparent most of the implementation details. However, the volume of data is continuously increasing, and a growing number of companies are adopting distributed storage platforms and distributed computing engines. There is a gap between big data and semantic technologies. Ontop, one of the reference OBDA systems, is limited to legacy relational databases, and the compatibility with the big data analytics engine Apache Spark is still missing. This paper introduces Chimera, an open-source software suite that aims at filling such a gap. Chimera enables a new type of round-tripping data science pipelines. Data Scientists can query data stored in a data lake using SPARQL through Ontop and SparkSQL while saving the semantic results of such analysis back in the data lake. This new type of pipelines semantically enriches data from Spark before saving them back.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Ontop project: https://ontop-vkg.org, Ontopic company: https://ontopic.biz/.
- 2.
- 3.
- 4.
- 5.
Chimera supports several Spark versions, starting from 2.4.0 to 3.1.1. Users can change the version by selecting the appropriate image tags.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
References
Bionda, E., et al.: The smart grid semantic platform: synergy between iec common information model (cim) and big data. In: 2019 IEEE International Conference on Environment and Electrical Engineering and 2019 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe). IEEE (2019)
Calvanese, D., et al.: OBDA with the ontop framework. In: SEBD, pp. 296–303. Curran Associates, Inc. (2015)
Calvanese, D., et al.: Ontop: answering SPARQL queries over relational databases. Semant. Web 8(3), 471–487 (2017)
Calvanese, D., et al.: The MASTRO system for ontology-based data access. Semant. Web 2(1), 43–53 (2011)
Chronis, Y., et al.: A relational approach to complex dataflows. In: EDBT/ICDT Workshops. CEUR Workshop Proceedings, vol. 1558. CEUR-WS.org (2016)
Giese, M., et al.: Optique: zooming in on big data. Computer 48(3), 60–67 (2015)
Graux, D., Jachiet, L., Genevès, P., Layaïda, N.: SPARQLGX: efficient distributed evaluation of SPARQL with apache spark. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 80–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_9
Kharlamov, E., et al.: Ontology based data access in statoil. J. Web Semant. 44, 3–36 (2017)
Kharlamov, E., et al.: Semantic access to streaming and static data at siemens. J. Web Semant. 44, 54–74 (2017)
Lehmann, J., et al.: Distributed semantic analytics using the SANSA stack. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10588, pp. 147–155. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_15
Mami, M.N., Graux, D., Scerri, S., Jabeen, H., Auer, S., Lehmann, J.: Squerall: virtual ontology-based access to heterogeneous and large data sources. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 229–245. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_15
Noy, N.F., McGuinness, D.L., et al.: Ontology development 101: A guide to creating your first ontology (2001)
Priyatna, F., Corcho, Ó., Sequeda, J.F.: Formalisation and experiences of r2rml-based SPARQL to SQL query translation using morph. In: WWW, pp. 479–490. ACM (2014)
Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: PSI EtA, p. 4. ACM (2010)
Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: Pigsparql: mapping SPARQL to pig latin. In: SWIM, p. 4. ACM (2011)
Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. Proc. VLDB Endow. 9(10), 804–815 (2016)
Sequeda, J.F., Miranker, D.P.: Ultrawrap: SPARQL execution on relational data. J. Web Semant. 22, 19–39 (2013)
Suárez-Figueroa, M.C., Gómez-Pérez, A., Motta, E., Gangemi, A. (eds.): Ontology Engineering in a Networked World. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24794-1
Uslar, M., Specht, M., Rohjans, S., Trefke, J., González, J.M.: The Common Information Model CIM: IEC 61968/61970 and 62325-A practical introduction to the CIM. Springer Science & Business Media (2012)
Xiao, G., Calvanese, D., Kontchakov, R., Lembo, D., Poggi, A., Rosati, R., Zakharyaschev, M.: Ontology-based data access: a survey. In: IJCAI, pp. 5511–5519. ijcai.org (2018)
Xiao, G., Ding, L., Cogrel, B., Calvanese, D.: Virtual knowledge graphs: an overview of systems and use cases. Data Intell. 1(3), 201–223 (2019)
Yu, H., Liaw, S., Taggart, J., Khorzoughi, A.R.: Using ontologies to identify patients with diabetes in electronic health records. In: International Semantic Web Conference (Posters & Demos). CEUR Workshop Proceedings, vol. 1035, pp. 77–80. CEUR-WS.org (2013)
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Acknowledgements
This work has been partially financed by the Research Fund for the Italian Electrical System in compliance with the Decree of Minister of Economical Development April 16, 2018. We also thank Marco Balduini for initiating Chimera.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Belcao, M., Falzone, E., Bionda, E., Valle, E.D. (2021). Chimera: A Bridge Between Big Data Analytics and Semantic Technologies. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-88361-4_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88360-7
Online ISBN: 978-3-030-88361-4
eBook Packages: Computer ScienceComputer Science (R0)