Querying Interlinked Data by Bridging RDF Molecule Templates

Endris, Kemele M.; Galkin, Mikhail; Lytra, Ioanna; Mami, Mohamed Nadjib; Vidal, Maria-Esther; Auer, Sören

doi:10.1007/978-3-662-58415-6_1

Kemele M. Endris¹⁸,
Mikhail Galkin^19,22,
Ioanna Lytra^19,21,
Mohamed Nadjib Mami¹⁹,
Maria-Esther Vidal^18,19,20 &
…
Sören Auer^18,19,20

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11310))

456 Accesses
4 Citations

Abstract

Linked Data initiatives have encouraged the publication of a large number of RDF datasets created by different data providers independently. These datasets can be accessed using different Web interfaces, e.g., SPARQL endpoint; however, federated query engines are still required in order to provide an integrated view of these datasets. Given the large number of Web accessible RDF datasets, SPARQL federated query engines implement query processing techniques to effectively select the relevant datasets that provide the data required to answer a query. Existing federated query engines usually utilize coarse-grained description methods where datasets are characterized based on their vocabularies or schema, and details about data in the dataset are ignored, e.g., classes, properties, or relations. This lack of source description may lead to the erroneous selection of data sources for a query, and unnecessary retrieval of data and source communication, affecting thus the performance of query processing over the federation. We address the problem of federated SPARQL query processing and devise MULDER, a query engine for federations of RDF data sources. MULDER describes data sources in terms of an abstract description of entities belonging to the same RDF class, dubbed as an RDF molecule template, and utilizes them for source selection, and query decomposition and optimization. We empirically study the performance and continuous efficiency of MULDER on existing benchmarks, and compare with respect to existing federated SPARQL query engines. The experimental results suggest that RDF molecule templates empower MULDER, and allow for selection of RDF data sources that not only reduce execution time, but also increase answer completeness and continuous efficiency of MULDER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.w3.org/TR/rdf11-concepts/.
2.
https://www.w3.org/TR/2013/REC-sparql11-query-20130321/.
3.
https://github.com/SDM-TIB/MulderTLDKS.
4.
https://networkx.github.io/.
5.
BSBM queries can be found in the Appendix A.
6.
The graph visualization was generated using the open source software platform cytoscape – http://www.cytoscape.org/.
7.
FedBench queries can be found in http://fedbench.fluidops.net/resource/Queries.
8.
A lower number of connected components indicates a stronger connectivity.
9.
LSLOD queries can be found in Appendix A.

References

Abdelaziz, I., Essam, M., Mourad, O., Ashraf, A., Kalnis, P.: Lusail: a system for querying linked data at scale. Proc. VLDB Endow. 10(9), 485–498 (2017)
Google Scholar
Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_2
Chapter Google Scholar
Acosta, M., Vidal, M.-E., Sure-Vetter, Y.: Diefficiency metrics: measuring the continuous efficiency of query processing approaches. In: d’Amato, C., et al. (eds.) ISWC 2017, Part II. LNCS, vol. 10588, pp. 3–19. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_1
Chapter Google Scholar
Alexander, K., Hausenblas, M.: Describing linked datasets-on the design and usage of VoID, the ‘Vocabulary of Interlinked Datasets’. In: LDOW (2009)
Google Scholar
Basca, C., Bernstein, A.: Querying a messy web of data with Avalanche. J. Web Semant. 26, 1–28 (2014)
Article Google Scholar
Bizer, C., Schultz, A.: The berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(2), 1–24 (2009)
Article Google Scholar
Charalambidis, A., Troumpoukis, A., Konstantopoulos, S.: SemaGrow: optimizing federated SPARQL queries. In: Proceedings of the 11th International Conference on Semantic Systems, pp. 121–128. ACM (2015)
Google Scholar
Chen, C., Golshan, B., Halevy, A.Y., Tan, W., Doan, A.: BigGorilla: an open-source ecosystem for data preparation and integration. IEEE Data Eng. Bull. 41(2), 10–22 (2018)
Google Scholar
Doan, A., Halevy, A.Y.: Semantic integration research in the database community: a brief survey. AI Mag. 26(1), 83–94 (2005)
Google Scholar
Endris, K.M., Galkin, M., Lytra, I., Mami, M.N., Vidal, M.-E., Auer, S.: MULDER: querying the linked data web by bridging RDF molecule templates. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10438, pp. 3–18. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64468-4_1
Chapter Google Scholar
Feigenbaum, L., Williams, G.T., Clark, K.G., Torres, E.: SPARQL 1.1 protocol. Recommendation, World Wide Web Consortium, March 2013. http://www.w3.org/TR/sparql11-protocol/
Fernández, J.D., Llaves, A., Corcho, O.: Efficient RDF interchange (ERI) format for RDF data streams. In: Mika, P., et al. (eds.) ISWC 2014, Part II. LNCS, vol. 8797, pp. 244–259. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11915-1_16
Chapter Google Scholar
Fernández, J.D., Martínez-Prieto, M.A., de la Fuente Redondo, P., Gutiérrez, C.: Characterising RDF data sets. J. Inf. Sci. 44(2), 203–229 (2018)
Article Google Scholar
Florescu, D., Levy, A.Y., Mendelzon, A.O.: Database techniques for the world-wide web: a survey. SIGMOD Rec. 27(3), 59–74 (1998)
Article Google Scholar
Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting VoID descriptions. In: COLD (2011)
Google Scholar
Gubichev, A., Neumann, T.: Exploiting the query structure for efficient join ordering in SPARQL queries. In: EDBT, vol. 14, pp. 439–450 (2014)
Google Scholar
Halevy, A.Y.: Answering queries using views: a survey. VLDB J. 10(4), 270–294 (2001)
Article Google Scholar
Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp. 9–16 (2006)
Google Scholar
Hasnain, A., et al.: BioFed: federated query processing over life sciences linked open data. J. Biomed. Semant. 8(1), 13 (2017)
Article Google Scholar
Hayes, P., Patel-Schneider, P.: RDF 1.1 semantics, February 2014
Google Scholar
Ives, Z.G., Florescu, D., Friedman, M., Levy, A.Y., Weld, D.S.: An adaptive query execution system for data integration. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, 1–3 June 1999, pp. 299–310 (1999)
Google Scholar
Ives, Z.G., Halevy, A.Y., Mork, P., Tatarinov, I.: Piazza: mediation and integration infrastructure for semantic web data. J. Web Sem. 1(2), 155–175 (2004)
Article Google Scholar
Jha, A., et al.: Towards precision medicine: discovering novel gynecological cancer biomarkers and pathways using linked data. J. Biomed. Semant. 8(1), 40:1–40:16 (2017)
Article Google Scholar
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Article MathSciNet Google Scholar
Montoya, G., Skaf-Molli, H., Hose, K.: The Odyssey approach for optimizing federated SPARQL queries. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 471–489. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_28
Chapter Google Scholar
Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.-E.: Federated SPARQL queries processing with replicated fragments. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 36–51. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_3
Chapter Google Scholar
Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.: Decomposing federated queries in presence of replicated fragments. J. Web Semant. 42, 1–18 (2017)
Article Google Scholar
Montoya, G., Vidal, M.-E., Acosta, M.: A heuristic-based approach for planning federated SPARQL queries. In: Proceedings of the Third International Conference on Consuming Linked Data, vol. 905, pp. 63–74. CEUR-WS. org (2012)
Google Scholar
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 984–994. IEEE (2011)
Google Scholar
Palma, G., Vidal, M.-E., Raschid, L.: Drug-target interaction prediction using semantic similarity and edge partitioning. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 131–146. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_9
Chapter Google Scholar
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. (TODS) 34(3), 16 (2009)
Article Google Scholar
Saleem, M., Khan, Y., Hasnain, A., Ermilov, I., Ngomo, A.N.: A fine-grained evaluation of SPARQL endpoint federation systems. Semant. Web 7(5), 493–518 (2015)
Article Google Scholar
Saleem, M., Ngonga Ngomo, A.-C., Xavier Parreira, J., Deus, H.F., Hauswirth, M.: DAW: Duplicate-AWare federated query processing over the web of data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 574–590. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_36
Chapter Google Scholar
Saleem, M., Ngonga Ngomo, A.-C.: HiBISCuS: hypergraph-based source selection for SPARQL endpoint federation. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 176–191. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07443-6_13
Chapter Google Scholar
Scheufele, W., Moerkotte, G.: On the complexity of generating optimal plans with cross products. In: 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 238–248 (1997)
Google Scholar
Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_37
Chapter Google Scholar
Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L., et al. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_37
Chapter Google Scholar
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: Sp\(\wedge \)2bench: a SPARQL performance benchmark. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 222–233. IEEE (2009)
Google Scholar
Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th International Conference on Database Theory, pp. 4–33. ACM (2010)
Google Scholar
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_38
Chapter Google Scholar
Verborgh, R., et al.: Triple pattern fragments: a low-cost knowledge graph interface for the web. J. Web Semant. 37, 184–206 (2016)
Article Google Scholar
Vidal, M., Castillo, S., Acosta, M., Montoya, G., Palma, G.: On the selection of SPARQL endpoints to efficiently execute federated SPARQL queries. Trans. Large-Scale Data- Knowl.-Centered Syst. 25, 109–149 (2016)
Article Google Scholar
Wylot, M., Cudré-Mauroux, P.: DiploCloud: efficient and scalable management of RDF data in the cloud. IEEE Trans. Knowl. Data Eng. 28(3), 659–674 (2016)
Article Google Scholar
Zadorozhny, V., Raschid, L., Vidal, M.-E., Urhan, T., Bright, L.: Efficient evaluation of queries in a mediator for WebSources. In: Proceedings of the SIGMOD Conference, pp. 85–96 (2002)
Google Scholar

Download references

Acknowledgements

This work has been partially funded by the EU Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 642795 (WDAqua), the EU H2020 programme for the projects BigDataEurope (GA 644564), and iASiS (GA 727658). Mikhail Galkin is supported by a scholarship of German Academic Exchange Service (DAAD).

Author information

Authors and Affiliations

L3S Research Center, University of Hannover, Hannover, Germany
Kemele M. Endris, Maria-Esther Vidal & Sören Auer
Fraunhofer IAIS, Sankt Augustin, Germany
Mikhail Galkin, Ioanna Lytra, Mohamed Nadjib Mami, Maria-Esther Vidal & Sören Auer
Leibniz Information Centre For Science and Technology University Library (TIB), Hannover, Germany
Maria-Esther Vidal & Sören Auer
University of Bonn, Bonn, Germany
Ioanna Lytra
ITMO University, Saint Petersburg, Russia
Mikhail Galkin

Authors

Kemele M. Endris
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Galkin
View author publications
You can also search for this author in PubMed Google Scholar
Ioanna Lytra
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nadjib Mami
View author publications
You can also search for this author in PubMed Google Scholar
Maria-Esther Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Sören Auer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kemele M. Endris .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Roland Wagner
IUT, University Lyon 1, Villeurbanne Cedex, France
Djamal Benslimane
University of Milan, Crema, Italy
Ernesto Damiani
University of Michigan-Dearborn, Dearborn, MI, USA
William I. Grosky

Appendices

A BSBM Queries

B LSLOD Queries

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Endris, K.M., Galkin, M., Lytra, I., Mami, M.N., Vidal, ME., Auer, S. (2018). Querying Interlinked Data by Bridging RDF Molecule Templates. In: Hameurlain, A., Wagner, R., Benslimane, D., Damiani, E., Grosky, W. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIX. Lecture Notes in Computer Science(), vol 11310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58415-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-662-58415-6_1
Published: 23 November 2018
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58414-9
Online ISBN: 978-3-662-58415-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Querying Interlinked Data by Bridging RDF Molecule Templates

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendices

A BSBM Queries

B LSLOD Queries

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation