Skip to main content
Log in

SShare: a simulator for studying and evaluating decentralized SPARQL query processing

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Previously, we proposed efficient, scalable decentralized processing of SPARQL queries for an ad hoc Semantic Web data sharing system and explored optimization techniques. However, it has proven to be difficult to measure the performance of the proposed query processing in a decentralized setting with existing tools. This is because assessments on SPARQL query performance were typically targeted at a centralized or single-machine settings, and node-to-node communication costs occurring when (sub-)queries were forwarded among multiple nodes have rarely been taken into consideration. We hereby developed a simulator, SShare, that bridges Jena, a Java framework that supports querying RDF data with SPARQL, and ns-3 (network simulator 3), a discrete-event network simulator using C++ and Python. With SShare, one can submit any proper SPARQL query that involves RDF data of interest scattered around distributed hosts (the details of which are unknown to the query initiator), evaluate important performance metrics (e.g., the inter-site data transmission volume and communication delay) obtained at the network level, and finally get visualized results. We anticipated that SShare would be beneficial to others who are keen on better capturing and analyzing the inherent feature of various distributed and decentralized SPARQL processing mechanisms over a large-scale network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. We borrowed the term from ad hoc networking in that the ad hoc environment for Semantic Web data sharing has many features with ad hoc networking in common: no centralized authority, self-organization, multiple nodes connected by links, and dynamics.

  2. The source code and documentation for SShare are under constant maintenance and development, and can be accessed via http://sshare.sinaapp.com.

  3. Chord provides a unique mapping between an identifier space and a set of nodes; each node is therefore associated with an identifier. Chord maps an identifier, say id, to a node with the smallest identifier greater than id and the node is called the successor (node) of id.

  4. This triple-indexing approach was also presented by Atlas [7] in a similar way.

  5. Put it simply, Chord uses a hash function SHA-1 to get the key identifier Hash(key) of a given key and then stores it at its successor node.

  6. Chord was unable to provide the functionality required for this purpose since it merely associates identifiers with successor nodes. We, therefore, adopted DHASH [5] as shown in Fig. 5.

  7. We proposed to apply the move-small strategy, when evaluating a SPARQL query that contains more than two conjunction graph patterns, to resolve the query in an optimized fashion by using the frequency information (see Sect. 2) available in the location table of related index nodes [15].

  8. Information[1].query and Information[3].query are the same but they are associated with different keys. We distinguish between them to point out that Information[1].query obtains its answer directly from storage nodes. Subsequently, the answer to Information[3].query is acquired by running the query against a merged RDF graph consisting of individual RDF graphs collected by running Information[1].query as mentioned earlier.

  9. A solution mapping can be broken down into a set of tuples that contain variables and their corresponding values in RDF terms [10].

  10. http://www.pudn.com/downloads448/sourcecode/java/detail1890872.html.

  11. The ChordIpv4 module was developed by Harjot Gill to support the Chord/DHASH, see http://code.nsnam.org/gillh/ns-3-chord/.

  12. The function is frequently used during the construction of location tables. For example, when a node that has a triple (a:person b:name ‘jason’) as in Fig. 1 joins the Semantic Web data sharing system, an index on its subject needs to be built and the function insert(Index(s), s:http://a/person) will be invoked.

  13. We tested with different ratios of index nodes to storage nodes and found that the more index nodes the shorter the response time of queries. This is because less index nodes indicate that the probability of any two (or more) queries being forwarded to the same index node will be higher; due to the limitation of bandwidth, it is very likely to take longer time to respond to these queries.

  14. We set the default value for the transmission rate, propagation delay, and MTU as in ns-3.

  15. The maximum number of the RDF triple copies is a tunable parameter.

  16. http://www.nsnam.org/

  17. http://www.riverbed.com/.

  18. http://tetcos.com/.

  19. http://librdf.org/.

References

  1. Beckett D (2001) The Design and implementation of the Redland RDF application framework. In: Proceedings of the 10th international conference on world wide web, ACM, New York, NY, USA, pp 449–456

  2. Beckett D (2014) RDF 1.1 N-Triples: a line-based syntax for an RDF graph. W3C Recommendation. http://www.w3.org/TR/n-triples/, 25 Feb 2014

  3. Beckett D, Berners-Lee T, Prud’hommeaux E, Carothers G (2014) RDF 1.1 Turtle: terse RDF triple language. W3C recommendation. http://www.w3.org/TR/turtle/, 25 Feb 2014

  4. Cai M, Frank M (2004) RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network. In: Proceedings of the 13th international conference on world wide web. ACM, New York, NY, USA, pp 650–657

  5. Dabek F, Brunskill E, Kaashoek MF, Morris DKR, Stoica I, Balakrishnan H (2001) Building peer-to-peer systems with Chord, a distributed lookup service. In: Proceedings of the eighth workshop on hot topics in operating systems, IEEE, pp 81–86

  6. Enslow Jr PH, Saponas TG (1981) Distributed and decentralized control in fully distributed processing systems—a survey of applicable models. Final Technical Report GIT-ICS-81/02, School of Information and Computer Science, Georgia Institute of Technology, Atlanta, GA, USA

  7. Kaoudi Z, Koubarakis M, Kyzirakos K, Miliaraki I, Magiridou M, Papadakis-Pesaresi A (2010) Atlas: storing, updating and querying RDF(S) data on top of DHTs. Web Semant Sci Serv Agents World Wide Web 8(4):271–277

    Article  Google Scholar 

  8. Liarou E, Idreos S, Koubarakis M (2006) Evaluating conjunctive triple pattern queries over large structured overlay networks. In: Proceedings of the fifth international conference on the semantic web. Springer, Athens, GA, USA, pp 399–413

  9. Ns-3 project (2013) Ns-3 Tutorial. http://www.nsnam.org/docs/tutorial/html/index.html

  10. Pérez J, Arenas M, Gutierrez C (2009) Semantics and complexity of SPARQL. ACM Trans Database Syst 34(3):1–45

    Article  Google Scholar 

  11. Prud’hommeaux E, Seaborne A (2008) SPARQL query language for RDF. W3C recommendation. http://www.w3.org/TR/rdf-sparql-query/. 15 Jan 2008

  12. Schmidt M, Hornung T, Lausen G, Pinkel C (2009) SP2Bench: a SPARQL performance benchmark. In: Proceedings of the 25th international conference on data engineering. IEEE Computer Society, Shanghai, China, pp 222–233

  13. Seaborne A, Polleres A, Feigenbaum L, Williams GT (2013) SPARQL 1.1 federated query. W3C recommendation. http://www.w3.org/TR/sparql11-federated-query/. 21 Mar 2013

  14. Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H (2001) Chord: a scalable peer-to-peer lookup service for Internet applications. In: Proceedings of the 2001 conference on applications, technologies, architectures, and protocols for computer communications. ACM, San Diego, California, USA, pp 149–160

  15. Zhou J, Bochmann GV, Shi Z (2014) Supporting decentralized SPARQL queries in an ad-hoc semantic web data sharing system. Int J Netw Comput 4(1):88–110

    Google Scholar 

Download references

Acknowledgments

This work was funded by the Engineering Disciplines Planning Project of the Communication University of China (No. 3132014XNG1453) and the National Key Technology R&D Program (No. 2013BAH66F02). The authors also acknowledge the input of PAPD and CICAEET.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiguo Qu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, J., Huang, Q., Xie, W. et al. SShare: a simulator for studying and evaluating decentralized SPARQL query processing. Pers Ubiquit Comput 19, 1087–1097 (2015). https://doi.org/10.1007/s00779-015-0878-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-015-0878-4

Keywords

Navigation