Skip to main content

Querying Interlinked Data by Bridging RDF Molecule Templates

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIX

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11310))

Abstract

Linked Data initiatives have encouraged the publication of a large number of RDF datasets created by different data providers independently. These datasets can be accessed using different Web interfaces, e.g., SPARQL endpoint; however, federated query engines are still required in order to provide an integrated view of these datasets. Given the large number of Web accessible RDF datasets, SPARQL federated query engines implement query processing techniques to effectively select the relevant datasets that provide the data required to answer a query. Existing federated query engines usually utilize coarse-grained description methods where datasets are characterized based on their vocabularies or schema, and details about data in the dataset are ignored, e.g., classes, properties, or relations. This lack of source description may lead to the erroneous selection of data sources for a query, and unnecessary retrieval of data and source communication, affecting thus the performance of query processing over the federation. We address the problem of federated SPARQL query processing and devise MULDER, a query engine for federations of RDF data sources. MULDER describes data sources in terms of an abstract description of entities belonging to the same RDF class, dubbed as an RDF molecule template, and utilizes them for source selection, and query decomposition and optimization. We empirically study the performance and continuous efficiency of MULDER on existing benchmarks, and compare with respect to existing federated SPARQL query engines. The experimental results suggest that RDF molecule templates empower MULDER, and allow for selection of RDF data sources that not only reduce execution time, but also increase answer completeness and continuous efficiency of MULDER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.w3.org/TR/rdf11-concepts/.

  2. 2.

    https://www.w3.org/TR/2013/REC-sparql11-query-20130321/.

  3. 3.

    https://github.com/SDM-TIB/MulderTLDKS.

  4. 4.

    https://networkx.github.io/.

  5. 5.

    BSBM queries can be found in the Appendix A.

  6. 6.

    The graph visualization was generated using the open source software platform cytoscape – http://www.cytoscape.org/.

  7. 7.

    FedBench queries can be found in http://fedbench.fluidops.net/resource/Queries.

  8. 8.

    A lower number of connected components indicates a stronger connectivity.

  9. 9.

    LSLOD queries can be found in Appendix A.

References

  1. Abdelaziz, I., Essam, M., Mourad, O., Ashraf, A., Kalnis, P.: Lusail: a system for querying linked data at scale. Proc. VLDB Endow. 10(9), 485–498 (2017)

    Google Scholar 

  2. Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_2

    Chapter  Google Scholar 

  3. Acosta, M., Vidal, M.-E., Sure-Vetter, Y.: Diefficiency metrics: measuring the continuous efficiency of query processing approaches. In: d’Amato, C., et al. (eds.) ISWC 2017, Part II. LNCS, vol. 10588, pp. 3–19. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_1

    Chapter  Google Scholar 

  4. Alexander, K., Hausenblas, M.: Describing linked datasets-on the design and usage of VoID, the ‘Vocabulary of Interlinked Datasets’. In: LDOW (2009)

    Google Scholar 

  5. Basca, C., Bernstein, A.: Querying a messy web of data with Avalanche. J. Web Semant. 26, 1–28 (2014)

    Article  Google Scholar 

  6. Bizer, C., Schultz, A.: The berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(2), 1–24 (2009)

    Article  Google Scholar 

  7. Charalambidis, A., Troumpoukis, A., Konstantopoulos, S.: SemaGrow: optimizing federated SPARQL queries. In: Proceedings of the 11th International Conference on Semantic Systems, pp. 121–128. ACM (2015)

    Google Scholar 

  8. Chen, C., Golshan, B., Halevy, A.Y., Tan, W., Doan, A.: BigGorilla: an open-source ecosystem for data preparation and integration. IEEE Data Eng. Bull. 41(2), 10–22 (2018)

    Google Scholar 

  9. Doan, A., Halevy, A.Y.: Semantic integration research in the database community: a brief survey. AI Mag. 26(1), 83–94 (2005)

    Google Scholar 

  10. Endris, K.M., Galkin, M., Lytra, I., Mami, M.N., Vidal, M.-E., Auer, S.: MULDER: querying the linked data web by bridging RDF molecule templates. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10438, pp. 3–18. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64468-4_1

    Chapter  Google Scholar 

  11. Feigenbaum, L., Williams, G.T., Clark, K.G., Torres, E.: SPARQL 1.1 protocol. Recommendation, World Wide Web Consortium, March 2013. http://www.w3.org/TR/sparql11-protocol/

  12. Fernández, J.D., Llaves, A., Corcho, O.: Efficient RDF interchange (ERI) format for RDF data streams. In: Mika, P., et al. (eds.) ISWC 2014, Part II. LNCS, vol. 8797, pp. 244–259. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11915-1_16

    Chapter  Google Scholar 

  13. Fernández, J.D., Martínez-Prieto, M.A., de la Fuente Redondo, P., Gutiérrez, C.: Characterising RDF data sets. J. Inf. Sci. 44(2), 203–229 (2018)

    Article  Google Scholar 

  14. Florescu, D., Levy, A.Y., Mendelzon, A.O.: Database techniques for the world-wide web: a survey. SIGMOD Rec. 27(3), 59–74 (1998)

    Article  Google Scholar 

  15. Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting VoID descriptions. In: COLD (2011)

    Google Scholar 

  16. Gubichev, A., Neumann, T.: Exploiting the query structure for efficient join ordering in SPARQL queries. In: EDBT, vol. 14, pp. 439–450 (2014)

    Google Scholar 

  17. Halevy, A.Y.: Answering queries using views: a survey. VLDB J. 10(4), 270–294 (2001)

    Article  Google Scholar 

  18. Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp. 9–16 (2006)

    Google Scholar 

  19. Hasnain, A., et al.: BioFed: federated query processing over life sciences linked open data. J. Biomed. Semant. 8(1), 13 (2017)

    Article  Google Scholar 

  20. Hayes, P., Patel-Schneider, P.: RDF 1.1 semantics, February 2014

    Google Scholar 

  21. Ives, Z.G., Florescu, D., Friedman, M., Levy, A.Y., Weld, D.S.: An adaptive query execution system for data integration. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, 1–3 June 1999, pp. 299–310 (1999)

    Google Scholar 

  22. Ives, Z.G., Halevy, A.Y., Mork, P., Tatarinov, I.: Piazza: mediation and integration infrastructure for semantic web data. J. Web Sem. 1(2), 155–175 (2004)

    Article  Google Scholar 

  23. Jha, A., et al.: Towards precision medicine: discovering novel gynecological cancer biomarkers and pathways using linked data. J. Biomed. Semant. 8(1), 40:1–40:16 (2017)

    Article  Google Scholar 

  24. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MathSciNet  Google Scholar 

  25. Montoya, G., Skaf-Molli, H., Hose, K.: The Odyssey approach for optimizing federated SPARQL queries. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 471–489. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_28

    Chapter  Google Scholar 

  26. Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.-E.: Federated SPARQL queries processing with replicated fragments. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 36–51. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_3

    Chapter  Google Scholar 

  27. Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.: Decomposing federated queries in presence of replicated fragments. J. Web Semant. 42, 1–18 (2017)

    Article  Google Scholar 

  28. Montoya, G., Vidal, M.-E., Acosta, M.: A heuristic-based approach for planning federated SPARQL queries. In: Proceedings of the Third International Conference on Consuming Linked Data, vol. 905, pp. 63–74. CEUR-WS. org (2012)

    Google Scholar 

  29. Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 984–994. IEEE (2011)

    Google Scholar 

  30. Palma, G., Vidal, M.-E., Raschid, L.: Drug-target interaction prediction using semantic similarity and edge partitioning. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 131–146. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_9

    Chapter  Google Scholar 

  31. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. (TODS) 34(3), 16 (2009)

    Article  Google Scholar 

  32. Saleem, M., Khan, Y., Hasnain, A., Ermilov, I., Ngomo, A.N.: A fine-grained evaluation of SPARQL endpoint federation systems. Semant. Web 7(5), 493–518 (2015)

    Article  Google Scholar 

  33. Saleem, M., Ngonga Ngomo, A.-C., Xavier Parreira, J., Deus, H.F., Hauswirth, M.: DAW: Duplicate-AWare federated query processing over the web of data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 574–590. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_36

    Chapter  Google Scholar 

  34. Saleem, M., Ngonga Ngomo, A.-C.: HiBISCuS: hypergraph-based source selection for SPARQL endpoint federation. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 176–191. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07443-6_13

    Chapter  Google Scholar 

  35. Scheufele, W., Moerkotte, G.: On the complexity of generating optimal plans with cross products. In: 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 238–248 (1997)

    Google Scholar 

  36. Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_37

    Chapter  Google Scholar 

  37. Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L., et al. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_37

    Chapter  Google Scholar 

  38. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: Sp\(\wedge \)2bench: a SPARQL performance benchmark. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 222–233. IEEE (2009)

    Google Scholar 

  39. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th International Conference on Database Theory, pp. 4–33. ACM (2010)

    Google Scholar 

  40. Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_38

    Chapter  Google Scholar 

  41. Verborgh, R., et al.: Triple pattern fragments: a low-cost knowledge graph interface for the web. J. Web Semant. 37, 184–206 (2016)

    Article  Google Scholar 

  42. Vidal, M., Castillo, S., Acosta, M., Montoya, G., Palma, G.: On the selection of SPARQL endpoints to efficiently execute federated SPARQL queries. Trans. Large-Scale Data- Knowl.-Centered Syst. 25, 109–149 (2016)

    Article  Google Scholar 

  43. Wylot, M., Cudré-Mauroux, P.: DiploCloud: efficient and scalable management of RDF data in the cloud. IEEE Trans. Knowl. Data Eng. 28(3), 659–674 (2016)

    Article  Google Scholar 

  44. Zadorozhny, V., Raschid, L., Vidal, M.-E., Urhan, T., Bright, L.: Efficient evaluation of queries in a mediator for WebSources. In: Proceedings of the SIGMOD Conference, pp. 85–96 (2002)

    Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the EU Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 642795 (WDAqua), the EU H2020 programme for the projects BigDataEurope (GA 644564), and iASiS (GA 727658). Mikhail Galkin is supported by a scholarship of German Academic Exchange Service (DAAD).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kemele M. Endris .

Editor information

Editors and Affiliations

Appendices

Appendices

A BSBM Queries

figure d
figure e
figure f
figure g
figure h
figure i
figure j
figure k
figure l
figure m
figure n
figure o
figure p

B LSLOD Queries

figure q
figure r
figure s
figure t
figure u
figure v
figure w
figure x
figure y
figure z
figure aa

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Endris, K.M., Galkin, M., Lytra, I., Mami, M.N., Vidal, ME., Auer, S. (2018). Querying Interlinked Data by Bridging RDF Molecule Templates. In: Hameurlain, A., Wagner, R., Benslimane, D., Damiani, E., Grosky, W. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIX. Lecture Notes in Computer Science(), vol 11310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58415-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-58415-6_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-58414-9

  • Online ISBN: 978-3-662-58415-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics