Skip to main content
Log in

Distributed RDF store for efficient searching billions of triples based on Hadoop

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

As the development of IT and scientific technology, very large amounts of knowledge data are continuously being created and the big data era can be said to have arrived. Therefore, RDF store inserting and inquiring into knowledge bases has to be scaled up in order to deal with such large sources of data. To this end, we propose a scalable distributed RDF store based on a distributed database that uses bulk-loading for billions of triples to store data and to respond to user queries quickly. In order to achieve this purpose, we introduce a bulk-loading algorithm using the MapReduce framework and the SPARQL query processing engine to connect to a large distributed database. Experimental results show that the proposed bulk-loading algorithm achieves 67.893K triples per second to load approximately 33 billion triples. Therefore, the experiment proves proposed RDF store can manage billions of triples scale data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Ko M, Choi W (2013) A distributional inference for cross-lingual undefined entities linking. J Converg 4(2):23–28

    Google Scholar 

  2. Augusto JC, Callaghan V, Cook D, Kameas A, Satoh I (2013) Intelligent environments: a manifesto. Hum Cent Comput Inform Sci 3(12):1–18

  3. Onte MB, Marcial DE (2013) Developing a web-based knowledge product outsourcing system at a university. J Inform Process Syst 9(4):548–566

    Article  Google Scholar 

  4. Kim J, Lee S, Jeong D and Jung H (2012) Semantic data model and service for supporting intelligent legislation establishment. The 2nd joint international semantic technology conference

  5. Linked Open Data Cloud Diagram. http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/

  6. Khadilkar V, Kantarcioglu M, Thuraisingham B, Castagna P (2012) Jena-HBase: a distributed, scalable and efficient RDF triple store. ISWC

  7. Papailiou N, Konstantinou L, Tsoumakos D, Korizirs N (2012) H2RDF: adaptive query processing on RDF data in the cloud. WWW

  8. Hbase. http://hbase.apache.org/

  9. Jena. https://jena.apache.org/

  10. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. OSDI, 2004, pp 137–150

  11. Casandra. http://cassandra.apache.org/

  12. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach Deborah A, Burrows M, Chandra T, Fikes A, Gruber RE (2006) Bigtable: a distributed storage system for structured data. OSDI

  13. DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. SOSP

  14. Stoica I, Morris R, Karger D, Frans Kaashoek M, Balakrishnan H (2001) Chord: a scalable peer-to-peer lookup service for internet applications. SIGCOMM

  15. Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. The 1st ACM symposium on cloud computing, pp 143–154

  16. Sesame. http://www.openrdf.org

  17. Bizer C, Schultz A (2009) The Berlin SPARQL benchmark. Int J Semant Web Inform Syst 5(2):1–24

    Article  Google Scholar 

  18. Ladwig G, Harth A (2011) CumulusRDF: linked data management on nested key-value stores. ISWC

  19. Snappy compression codec. http://code.google.com/p/snappy/

  20. SPARQL 1.1. http://www.w3.org/TR/sparql11-query/

  21. LUBM. http://swat.cse.lehigh.edu/projects/lubm/

  22. Cao B, Yin J, Zhang Q, Ye Y (2010) A MapReduce-based architecture for rule matching in production system. CloudCom

Download references

Acknowledgments

This work was supported by the IT R&D program of MSIP/IITP. [B010-15-0353, High performance database solution development for Integrated big data monitoring and Analytics].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chang-Hoo Jeong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Um, JH., Lee, S., Kim, TH. et al. Distributed RDF store for efficient searching billions of triples based on Hadoop. J Supercomput 72, 1825–1840 (2016). https://doi.org/10.1007/s11227-016-1670-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1670-6

Keywords

Navigation