Abstract
As the development of IT and scientific technology, very large amounts of knowledge data are continuously being created and the big data era can be said to have arrived. Therefore, RDF store inserting and inquiring into knowledge bases has to be scaled up in order to deal with such large sources of data. To this end, we propose a scalable distributed RDF store based on a distributed database that uses bulk-loading for billions of triples to store data and to respond to user queries quickly. In order to achieve this purpose, we introduce a bulk-loading algorithm using the MapReduce framework and the SPARQL query processing engine to connect to a large distributed database. Experimental results show that the proposed bulk-loading algorithm achieves 67.893K triples per second to load approximately 33 billion triples. Therefore, the experiment proves proposed RDF store can manage billions of triples scale data.















Similar content being viewed by others
References
Ko M, Choi W (2013) A distributional inference for cross-lingual undefined entities linking. J Converg 4(2):23–28
Augusto JC, Callaghan V, Cook D, Kameas A, Satoh I (2013) Intelligent environments: a manifesto. Hum Cent Comput Inform Sci 3(12):1–18
Onte MB, Marcial DE (2013) Developing a web-based knowledge product outsourcing system at a university. J Inform Process Syst 9(4):548–566
Kim J, Lee S, Jeong D and Jung H (2012) Semantic data model and service for supporting intelligent legislation establishment. The 2nd joint international semantic technology conference
Linked Open Data Cloud Diagram. http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/
Khadilkar V, Kantarcioglu M, Thuraisingham B, Castagna P (2012) Jena-HBase: a distributed, scalable and efficient RDF triple store. ISWC
Papailiou N, Konstantinou L, Tsoumakos D, Korizirs N (2012) H2RDF: adaptive query processing on RDF data in the cloud. WWW
Hbase. http://hbase.apache.org/
Jena. https://jena.apache.org/
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. OSDI, 2004, pp 137–150
Casandra. http://cassandra.apache.org/
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach Deborah A, Burrows M, Chandra T, Fikes A, Gruber RE (2006) Bigtable: a distributed storage system for structured data. OSDI
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. SOSP
Stoica I, Morris R, Karger D, Frans Kaashoek M, Balakrishnan H (2001) Chord: a scalable peer-to-peer lookup service for internet applications. SIGCOMM
Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. The 1st ACM symposium on cloud computing, pp 143–154
Sesame. http://www.openrdf.org
Bizer C, Schultz A (2009) The Berlin SPARQL benchmark. Int J Semant Web Inform Syst 5(2):1–24
Ladwig G, Harth A (2011) CumulusRDF: linked data management on nested key-value stores. ISWC
Snappy compression codec. http://code.google.com/p/snappy/
SPARQL 1.1. http://www.w3.org/TR/sparql11-query/
Cao B, Yin J, Zhang Q, Ye Y (2010) A MapReduce-based architecture for rule matching in production system. CloudCom
Acknowledgments
This work was supported by the IT R&D program of MSIP/IITP. [B010-15-0353, High performance database solution development for Integrated big data monitoring and Analytics].
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Um, JH., Lee, S., Kim, TH. et al. Distributed RDF store for efficient searching billions of triples based on Hadoop. J Supercomput 72, 1825–1840 (2016). https://doi.org/10.1007/s11227-016-1670-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1670-6