skip to main content
10.1145/3589334.3645704acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

FedUP: Querying Large-Scale Federations of SPARQL Endpoints

Published: 13 May 2024 Publication History

Abstract

Processing SPARQL queries over large federations of SPARQL endpoints is crucial for keeping the Semantic Web decentralized. Despite the existence of hundreds of SPARQL endpoints, current federation engines only scale to dozens. One major issue comes from the current definition of the source selection problem, i.e., finding the minimal set of SPARQL endpoints to contact per triple pattern. Even if such a source selection is minimal, only a few combinations of sources may return results. Consequently, most of the query processing time is wasted evaluating combinations that return no results. In this paper, we introduce the concept of Result-Aware query plans. This concept ensures that every subquery of the query plan effectively contributes to the result of the query. To compute a Result-Aware query plan, we propose FedUP, a new federation engine able to produce Result-Aware query plans by tracking the provenance of query results. However, getting query results requires computing source selection, and computing source selection requires query results. To break this vicious cycle, FedUP computes results and provenances on tiny quotient summaries of federations at the cost of source selection accuracy. Experimental results on federated benchmarks demonstrate that FedUP outperforms state-of-the-art federation engines by orders of magnitude in the context of large-scale federations.

Supplemental Material

MP4 File
video presentation
MP4 File
Supplemental video

References

[1]
Ibrahim Abdelaziz, Essam Mansour, Mourad Ouzzani, Ashraf Aboulnaga, and Panos Kalnis. 2017. Lusail: A system for querying linked data at scale. Proc. VLDB Endow., Vol. 11, 4 (2017), 485--498.
[2]
Maribel Acosta, Olaf Hartig, and Juan Sequeda. 2018. Federated RDF query processing. Springer, Cham, 1--8.
[3]
Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo, and Edna Ruckhaus. 2011. ANAPSID: An adaptive query processing engine for SPARQL endpoints. In 10th International Semantic Web Conference (ISWC). Springer, Berlin, Heidelberg, 18--34.
[4]
v Sejla v Cebirić, Francc ois Goasdoué, Haridimos Kondylakis, Dimitris Kotzinos, Ioana Manolescu, Georgia Troullinou, and Mussab Zneika. 2019. Summarizing semantic graphs: A survey. VLDB J., Vol. 28, 3 (2019), 295--327.
[5]
v Sejla v Cebirić, Francc ois Goasdoué, and Ioana Manolescu. 2017. A Framework for efficient representative summarization of RDF graphs. In Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC) (CEUR Workshop Proceedings, Vol. 1963). CEUR-WS.org, Vienna, Austria, 4.
[6]
Angelos Charalambidis, Antonis Troumpoukis, and Stasinos Konstantopoulos. 2015. SemaGrow: Optimizing federated SPARQL queries. In Proceedings of the 11th International Conference on Semantic Systems. ACM, New York, NY, USA, 121--128.
[7]
Sijin Cheng and Olaf Hartig. 2021. FedQPL: A language for logical query plans over heterogeneous federations of RDF data sources. In the 22nd International Conference on Information Integration and Web-Based Applications & Services. ACM, New York, NY, USA, 436--445.
[8]
Sijin Cheng and Olaf Hartig. 2022. Towards query processing over heterogeneous federations of RDF data sources. In The Semantic Web: ESWC 2022 Satellite Events. Springer, Crete, Greece, 57--62.
[9]
Minh-Hoang Dang, Julien Aimonier-Davat, Pascal Molli, Olaf Hartig, Hala Skaf-Molli, and Yotlan Le Crom. 2023. FedShop: A benchmark for testing the scalability of SPARQL federation engines. In International Semantic Web Conference (ISWC). Springer, Athens, Greece, 285--301.
[10]
Kemele M. Endris, Mikhail Galkin, Ioanna Lytra, Mohamed Nadjib Mami, Maria-Esther Vidal, and Sö ren Auer. 2017. MULDER: Querying the linked data Web by bridging RDF molecule templates. In International Conference on Database and Expert Systems Applications (DEXA). Springer, Lyon, France, 3--18.
[11]
Olaf Görlitz and Steffen Staab. 2011. SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In Proceedings of the Second International Conference on Consuming Linked Data, Vol. 782. CEUR-WS.org, Aachen, DEU, 13--24.
[12]
Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language.
[13]
Lars Heling and Maribel Acosta. 2022. Federated SPARQL query processing over heterogeneous linked data fragments. In Proceedings of the ACM Web Conference 2022. ACM, New York, NY, USA, 1047--1057.
[14]
Pierre Maillot, Olivier Corby, Catherine Faron, Fabien Gandon, and Franck Michel. 2023. IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits. J. Web Semant., Vol. 76 (2023), 100775.
[15]
Gabriela Montoya, Hala Skaf-Molli, and Katja Hose. 2017a. The Odyssey approach for optimizing federated SPARQL queries. In International Semantic Web Conference (ISWC). Springer, Vienna, Austria, 471--489.
[16]
Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, and Maria-Esther Vidal. 2017b. Decomposing federated queries in presence of replicated fragments. Journal of Web Semantics, Vol. 42 (2017), 1--18.
[17]
Jorge Pé rez, Marcelo Arenas, and Claudio Gutié rrez. 2009. Semantics and complexity of SPARQL. ACM Transations on Database Systems, Vol. 34, 3 (2009), 16:1--16:45.
[18]
Bastian Quilitz and Ulf Leser. 2008. Querying distributed RDF data sources with SPARQL. In Extended Semantic Web Conference (ESWC). Springer, Tenerife, Canary Islands, Spain, 524--538.
[19]
Muhammad Saleem, Ali Hasnain, and Axel-Cyrille Ngonga Ngomo. 2018a. LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation. J. Web Semant., Vol. 48 (2018), 85--125.
[20]
Muhammad Saleem and Axel-Cyrille Ngonga Ngomo. 2014. HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation. In European Semantic Web Conference (ESWC). Springer, Anissaras, Crete, Greece, 176--191.
[21]
Muhammad Saleem, Alexander Potocki, Tommaso Soru, Olaf Hartig, and Axel-Cyrille Ngonga Ngomo. 2018b. CostFed: Cost-based query optimization for SPARQL endpoint federation. In 14th International Conference on Semantic Systems (SEMANTICS). Elsevier, Vienna, Austria, 163--174.
[22]
Michael Schmidt, Olaf Gö rlitz, Peter Haase, Gü nter Ladwig, Andreas Schwarte, and Thanh Tran. 2011. FedBench: A benchmark suite for federated semantic data query processing. In 10th International Semantic Web Conference (ISWC) (Lecture Notes in Computer Science). Springer, Bonn, Germany, 585--600.
[23]
Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. 2011. FedX: Optimization techniques for federated query processing on linked data. In International Semantic Web Conference (ISWC). Springer, Bonn, Germany, 601--616.
[24]
Pierre-Yves Vandenbussche, Jü rgen Umbrich, Luca Matteis, Aidan Hogan, and Carlos Buil Aranda. 2017. SPARQLES: Monitoring public SPARQL endpoints. Semantic Web, Vol. 8, 6 (2017), 1049--1065.
[25]
Maria-Esther Vidal, Simó n Castillo, Maribel Acosta, Gabriela Montoya, and Guillermo Palma. 2016. On the selection of SPARQL endpoints to efficiently execute federated SPARQL queries. Trans. Large Scale Data Knowl. Centered Syst., Vol. 25 (2016), 109--149. io

Index Terms

  1. FedUP: Querying Large-Scale Federations of SPARQL Endpoints

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '24: Proceedings of the ACM Web Conference 2024
    May 2024
    4826 pages
    ISBN:9798400701719
    DOI:10.1145/3589334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. federated query processing
    2. semantic web
    3. source selection

    Qualifiers

    • Research-article

    Funding Sources

    • ANR

    Conference

    WWW '24
    Sponsor:
    WWW '24: The ACM Web Conference 2024
    May 13 - 17, 2024
    Singapore, Singapore

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 84
      Total Downloads
    • Downloads (Last 12 months)84
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media