Skip to main content
Log in

Dynamic and fast processing of queries on large-scale RDF data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

As RDF data continue to gain popularity, we witness the fast growing trend of RDF datasets in both the number of RDF repositories and the size of RDF datasets. Many known RDF datasets contain billions of RDF triples (subject, predicate and object). One of the grant challenges for managing these huge RDF data is how to execute RDF queries efficiently. In this paper, we address the query processing problems against the billion triple challenges. We first identify some causes for the problems of existing query optimization schemes, such as large intermediate results, initial query cost estimation errors. Then, we present our block-oriented dynamic query plan generation approach powered with pipelining execution. Our approach consists of two phases. In the first phase, a near-optimal execution plan for queries is chosen by identifying the processing blocks of queries. We group the join patterns sharing a join variable into building blocks of the query plan since executing them first provides opportunities to reduce the size of intermediate results generated. In the second phase, we further optimize the initial pipelining for a given query plan. We employ optimization techniques, such as sideways information passing and semi-join, to further reduce the size of intermediate results, improve the query processing cost estimation and speed up the performance of query execution. Experimental results on several RDF datasets of over a billion triples demonstrate that our approach outperforms existing RDF query engines that rely on dynamic programming based static query processing strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Abadi DJ, Marcus A, Madden SR, Hollenbach K (2007) Scalable semantic web data management using vertical partitioning. In: Proceedings of VLDB 2007. ACM, pp 411–422

  2. Atre M, Chaoji V, Zaki MJ, Hendler JA (2010) Matrix bit loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of WWW 2010. ACM, pp 41–50

  3. Balkesen C, Teubner J, Alonso G, Özsu MT (2013) Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: Proceedings of ICDE’13, IEEE Computer Society

  4. Bernstein PA, Chiu D-MW (1981) Using semi-joins to solve relational queries. J Assoc Comput Mach 28(1):25–40

    Article  MathSciNet  MATH  Google Scholar 

  5. Binna R, Gassler W, Zangerle E, Pacher D, Specht G (2010) Spiderstore: exploiting main memory for efficient RDF graph representation and fast querying. In: Proceedings of workshop on semantic data management (SemData@VLDB) 2010

  6. Chebotko A, Lu S, Fotouhi F (2009) Semantics preserving SPARQL-to-SQL translation. Data Knowl Eng 68(10):973–1000

    Article  Google Scholar 

  7. Harth A, Umbrich J, Hogan A, Decker S (2007) YARS2: a federated repository for querying graph structured data from the web. In: Proceedings of ISWC/ASWC2007. Springer, Berlin, pp 211–224

  8. Hartig O, Bizer C, Freytag J-C (2009) Executing SPARQL queries over the web of linked data. In: Proceedings of ISWC 2009. Springer, Berlin, pp 293–309

  9. Hartig O, Heese R (2007) The SPARQL query graph model for query optimization. In: Proceedings of ESWC 2007. Springer, Berlin, pp 564–578

  10. Huang J, Abadi DJ, Ren K (2011) Scalable SPARQL querying of large RDF graphs. PVLDB 4(11):1123–1134

    Google Scholar 

  11. Ives ZG, Taylor NE (2008) Sideways information passing for push-style query processing. In: Proceedings of ICDE 2008

  12. Janik M, Kochut K (2005) BRAHMS: a workbench RDF store and high performance memory system for semantic association discovery. In: Proceedings of ISWC 2005. Springer, Berlin, pp 431–445

  13. Kim C, Sedlar E, Chhugani J, Kaldewey T, Nguyen AD, Blas AD, Lee VW, Satish N, Dubey P (2009) Sort vs. hash revisited: fast join implementation on modern multicore CPUs. PVLDB 2(2):1378–1389

    Google Scholar 

  14. Kossmann D, Stocker K (2000) Iterative dynamic programming: a new class of query optimization algorithms. ACM Trans Database Syst 25(1):4382

    Article  Google Scholar 

  15. LUBM (2005) http://swat.cse.lehigh.edu/projects/lubm/

  16. MonetDB (2010) Overview. http://monetdb.cwi.nl/

  17. Neumann T, Weikum G (2009) Scalable join processing on very large RDF graphs. In: Proceedings of SIGMOD 2009. ACM, pp 627–639

  18. Neumann T, Weikum G (2010a) The RDF-3X engine for scalable management of RDF data. VLDB J 19(1):91–113

    Article  Google Scholar 

  19. Neumann T, Weikum G (2010b) x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. PVLDB 3(1–2):256–263

    Google Scholar 

  20. Rohloff K, Schantz RE (2010) High-performance, massively scalable distributed systems using the mapreduce software framework: the shard triple-store. In: Proceedings of international workshop on programming support innovations for emerging distributed applications 2010 (PSI EtA ’10). ACM

  21. Selinger PG, Astrahan MM, Chamberlin DD, Lorie RA, Price TG (1979) Access path selection in a relational database management system. In: Proceedings of SIGMOD’79. Springer, Berlin, p 2334

  22. Semantic Web Challenge (2012) http://challenge.semanticweb.org/2012/

  23. Stocker K, Kossmann D, Braumandl R, Kemper KA (2001) Integrating semi-join-reducers into state of the art query processors. In: Proceedings of ICDE 2001, pp 575–584

  24. Stocker M, Seaborne A, Bernstein A, Kiefer C, Reynolds D (2008) SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of WWW 2008. ACM, pp 595–604

  25. SWEO Community Project (2010). Linking open data on the semantic web. http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

  26. Udrea O, Pugliese A, Subrahmanian VS (2007) Grin: a graph based RDF index. In: Proceedings of the 22nd AAAI conference on artificial intelligence, pp 1465–1470

  27. UniProt (n.d.). UniProt RDF distribution. ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/

  28. W3C (2008) SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/

  29. Weiss C, Karras P, Bernstein A (2008) Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1):1008–1019

    Google Scholar 

  30. Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of SIGMOD 2004. ACM, pp 335–346

  31. Yuan P, Liu P, Wu B, Liu L, Jin H, Zhang W (2013) TripleBit: a fast and compact system for large scale RDF data. PVLDB 6(7):517–528

    Google Scholar 

  32. Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gStore: answering SPARQL queries via subgraph matching. PVLDB 8:482–493

    Google Scholar 

Download references

Acknowledgments

The research is supported by National Science Foundation of China (61073096) and National High Technology Research and Development Program of China (863 Program) under Grant No. 2012AA011003. Ling Liu acknowledges the partial support of her research from Grants of NSF CISE NetSE program, SaTC program and I/UCRC Fundamental Research Program as well as Intel ISTC on Cloud Computing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pingpeng Yuan.

Appendices

LUBM queries

  • PREFIX rdf: \(<\)http://www.w3.org/1999/02/22-rdf-syntax-ns#\(>\)

  • PREFIX ub: \(<\)http://www.lehigh.edu/\(\sim \)zhp2/2004/0401/univ-bench.owl#\(>\)

Q1: :

SELECT ?x ?y ?z WHERE { ?y ub:teacherOf ?z . ?y rdf:type ub:Assistant-Professor . ?z rdf:type ub:Course . ?x ub:takesCourse ?z . ?x rdf:type ub:GraduateStudent . ?x ub:advisor ?y . }

Q2: :

SELECT ?x ?y ?z WHERE { ?x rdf:type ub:AssistantProfessor . ?y rdf:type ub:Department . ?x ub:worksFor ?y . ?z rdf:type ub:UndergraduateStudent . ?z ub:memberOf ?y . ?z ub:advisor ?x . }

Q3: :

SELECT ?x ?y WHERE { ?x rdf:type ub:AssistantProfessor . ?x ub:worksFor ?y . [] ub:memberOf ?y . ?y rdf:type ub:Department . }

Q4: :

SELECT ?x ?y WHERE { ?x rdf:type ub:FullProfessor . ?y rdf:type ub:UndergraduateStudent . ?y ub:advisor ?x . ?x ub:worksFor [] . }

Q5: :

SELECT ?x ?y ?z WHERE { ?z ub:subOrganizationOf ?y . ?y rdf:type ub:University . ?z rdf:type ub:Department . ?x ub:memberOf ?z . ?x rdf:type ub:GraduateStudent . ?x ub:undergraduateDegreeFrom ?y . }

Q6: :

SELECT ?x ?y ?z WHERE { ?y ub:teacherOf ?z . ?y rdf:type ub:FullProfessor . ?z rdf:type ub:Course . ?x ub:takesCourse ?z . ?x rdf:type ub:UndergraduateStudent . ?x ub:advisor ?y . }

Q7: :

SELECT ?x ?y WHERE { ?x rdf:type ub:GraduateStudent . ?x ub:takesCourse [] . ?y rdf:type ub:AssistantProfessor . ?x ub:advisor ?y . }

UniProt queries

Q1: :

SELECT ?protein ?annotation WHERE { ?protein uni:annotation ?annotation . ?protein rdf:type uni:Protein . ?protein uni:organism [] . ?annotation rdf:type [] . ?annotation uni:range ?range . }

Q2: :

SELECT ?protein ?annotation WHERE { ?protein uni:annotation ?annotation . ?protein rdf:type uni:Protein . ?protein uni:organism taxon:9606 . ?annotation rdf:type ?type . ?annotation rdfs:comment [] . }

Q3: :

SELECT ?protein ?annotation WHERE { ?protein uni:annotation ?annotation . ?protein rdf:type uni:Protein . ?annotation rdf:type \(<\) http://purl.uniprot.org/core/Transmembrane_An-notation \(>\) . }

Q4: :

SELECT ?b ?ab WHERE { ?b rdf:type uni:Protein . ?a uni:replaces ?ab . ?ab uni:replacedBy ?b . }

Q5: :

SELECT ?protein ?annotation WHERE { ?protein uni:annotation ?annotation . ?protein rdf:type uni:Protein . ?protein uni:organism taxon:9606 . ?annotation rdf:type \(<\) http://purl.uni-prot.org/core/Disease_Annotation \(>\) . ?protein uni:modified “2008-07-22” . }

Q6: :

SELECT ?a ?vo WHERE { ?a rdfs:seeAlso ?vo . ?a uni:classifiedWith \(<\) http://purl.uniprot.org/keywords/67 \(>\) . ?b uni:annotation ?annotation . ?b rdf:type uni:Protein . ?a uni:replaces ?ab . ?ab uni:replacedBy ?b . }

Q7: :

SELECT ?annotation ?a WHERE { ?annotation rdf:type \(<\) http://purl.uniprot.org/core/-Transmembrane_Annotation \(>\) . ?annotation uni:range ?range . ?annotation rdfs:comment ?text . ?a rdfs:seeAlso ?vo. ?a uni:classifiedWith \(<\) http://purl.uniprot.org/keywords/67 \(>\) . ?a uni:annotation ?annotation . }

BTC 2012 queries

Q1: :

SELECT distinct ?a ?b WHERE { ?a dbpedia:spouse ?b . ?a \(<\) http://www.w3.org/1999/02/-22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?b \(<\) http://www.w3.org/-1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . }

Q2: :

SELECT ?a ?l WHERE { ?a \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?a dbpedia:deathPlace ?l . ?l pos:lat ?lat . }

Q3: :

SELECT ?p ?l WHERE { ?p dbpedia:name [] . ?p dbpedia:deathPlace ?l . ?p dbpedia:spouse ?c . ?p \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/-ontology/Person \(>\) . ?l pos:long ?long . }

Q4: :

SELECT ?a ?b WHERE { ?a dbpedia:-spouse ?b . ?a \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?b \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?c owl:sameAs ?c2 . ?c2 pos:long [] . ?a dbpedia:deathPlace ?c . }

Q5: :

SELECT distinct ?a ?c ?c2 WHERE { ?a \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?a dbpedia:placeOfBirth ?c . ?c owl:sameAs ?c2 . ?c2 pos:lat ?lat . ?c2 pos:long “-4.256901” . }

Q6: :

SELECT distinct ?a ?b ?c WHERE { ?a dbpedia:spouse ?b . ?a \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) [] . ?a dbpedia:placeOfBirth ?c . ?b dbpedia:placeOfBirth ?c . ?c owl:sameAs ?c2 . ?c dbpedia:name ?d . }

Q7: :

SELECT distinct ?a ?b ?lat ?long WHERE { ?a dbpedia:spouse ?b . ?a \(<\) http://www.w3.org/-1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?b \(<\) http://www.w3.org/1999/02/22-rdf-syntax-ns#type \(>\) \(<\) http://dbpedia.org/ontology/Person \(>\) . ?a dbpedia:place-OfBirth ?c . ?b dbpedia:placeOfBirth ?c . ?c owl:sameAs ?c2 . ?c2 pos:lat ?lat . ?c2 pos:long ?long . }

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, P., Xie, C., Jin, H. et al. Dynamic and fast processing of queries on large-scale RDF data. Knowl Inf Syst 41, 311–334 (2014). https://doi.org/10.1007/s10115-013-0726-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0726-7

Keywords

Navigation