Abstract
Data lakes enable flexible knowledge discovery and reduce the overhead of materialized data integration. Albeit effective for data storage, query execution over data lakes may be expensive, being demanded novel techniques to generate plans able to exploit the main characteristics of data lakes. We devise Ontario, a federated query processing approach tailored for large-scale heterogeneous data. Ontario provides efficient and effective query processing over a federation of heterogeneous data sources in a data lake. Ontario resorts to source descriptions named RDF Molecule Templates, i.e., abstract descriptions of the properties of the entities in a unified schema and their implementation in a data lake. We empirically evaluate the effectiveness of the Ontario optimization techniques over state-of-the-art benchmarks. The observed results suggest that Ontario can effectively select plans composed of subqueries that can be efficiently executed against heterogeneous data sources in a data lake.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_2
Belleau, F., Nolin, M.-A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inf. 41(5), 706–716 (2008)
Duggan, J., et al.: The BigDAWG polystore system. SIGMOD Rec. 44(2), 11–16 (2015)
Endris, K.M., Galkin, M., Lytra, I., Mami, M.N., Vidal, M.-E., Auer, S.: Querying interlinked data by bridging RDF molecule templates. TLDKS 39, 1–42 (2018)
Golshan, B., Halevy, A.Y., Mihaila, G.A., Tan, W.: Data integration: after the teenage years. In: 2017 Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pp. 101–106 (2017)
Hasnain, A., et al.: BioFed: federated query processing over life sciences linked open data. J. Biomed. Seman. 8(1), 13:1–13:19 (2017)
Khan, Y., Zimmermann, A., Jha, A., Gadepally, V., D’Aquin, M., Sahay, R.: One size does not fit all: querying web polystores. IEEE Access 7, 9598–9617 (2019)
Maali, F., Cyganiak, R., Peristeras, V.: A publishing pipeline for linked government data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 778–792. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_59
Mami, M.N., Scerri, S., Auer, S., Vidal, M.-E.: Towards semantification of big data technology. In: Madria, S., Hara, T. (eds.) DaWaK 2016. LNCS, vol. 9829, pp. 376–390. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43946-4_25
Quix, C., Hai, R., Vatov, I.: GEMMS: a generic and extensible metadata management system for data lakes. In: 2016 28th International Conference on Advanced Information Systems Engineering CAiSE, pp. 129–136 (2016)
Samwald, M., et al.: Linked open drug data for pharmaceutical research and development. J. Cheminformatics 3(1), 19 (2011)
Scharffe, F., et al.: Enabling linked data publication with the Datalift platform. In: AAAI 2012, 26th Conference on Artificial Intelligence, W10: Semantic Cities, Toronto, Canada, July 2012
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_38
Vidal, M.-E., Ruckhaus, E., Lampo, T., MartÃnez, A., Sierra, J., Polleres, A.: Efficiently joining group patterns in SPARQL queries. In: Aroyo, L., et al. (eds.) ESWC 2010. LNCS, vol. 6088, pp. 228–242. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13486-9_16
Walker, C., Alrehamy, H.: Personal data lake with data gravity pull. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, BDCLOUD 2015, pp. 160–167, Washington, DC, USA. IEEE Computer Society (2015)
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)
Acknowledgement
This work has been partially supported by the EU H2020 RIA funded project iASiS with grant agreement No 727658.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Endris, K.M., Rohde, P.D., Vidal, ME., Auer, S. (2019). Ontario: Federated Query Processing Against a Semantic Data Lake. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-27615-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27614-0
Online ISBN: 978-3-030-27615-7
eBook Packages: Computer ScienceComputer Science (R0)