Distributed RDF store for efficient searching billions of triples based on Hadoop

Um, Jung-Ho; Lee, Seungwoo; Kim, Tae-Hong; Jeong, Chang-Hoo; Song, Sa-Kwang; Jung, Hanmin

doi:10.1007/s11227-016-1670-6

Distributed RDF store for efficient searching billions of triples based on Hadoop

Published: 16 February 2016

Volume 72, pages 1825–1840, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jung-Ho Um¹,
Seungwoo Lee¹,
Tae-Hong Kim¹,
Chang-Hoo Jeong¹,
Sa-Kwang Song¹ &
…
Hanmin Jung¹

514 Accesses
9 Citations
Explore all metrics

Abstract

As the development of IT and scientific technology, very large amounts of knowledge data are continuously being created and the big data era can be said to have arrived. Therefore, RDF store inserting and inquiring into knowledge bases has to be scaled up in order to deal with such large sources of data. To this end, we propose a scalable distributed RDF store based on a distributed database that uses bulk-loading for billions of triples to store data and to respond to user queries quickly. In order to achieve this purpose, we introduce a bulk-loading algorithm using the MapReduce framework and the SPARQL query processing engine to connect to a large distributed database. Experimental results show that the proposed bulk-loading algorithm achieves 67.893K triples per second to load approximately 33 billion triples. Therefore, the experiment proves proposed RDF store can manage billions of triples scale data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on RDF Data Store Based on NoSQL Systems for the Semantic Web Applications

Universal Storage Adaption for Distributed RDF-Triple Stores

A Distributed RDF Storage and Query Model Based on HBase

References

Ko M, Choi W (2013) A distributional inference for cross-lingual undefined entities linking. J Converg 4(2):23–28
Google Scholar
Augusto JC, Callaghan V, Cook D, Kameas A, Satoh I (2013) Intelligent environments: a manifesto. Hum Cent Comput Inform Sci 3(12):1–18
Onte MB, Marcial DE (2013) Developing a web-based knowledge product outsourcing system at a university. J Inform Process Syst 9(4):548–566
Article Google Scholar
Kim J, Lee S, Jeong D and Jung H (2012) Semantic data model and service for supporting intelligent legislation establishment. The 2nd joint international semantic technology conference
Linked Open Data Cloud Diagram. http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/
Khadilkar V, Kantarcioglu M, Thuraisingham B, Castagna P (2012) Jena-HBase: a distributed, scalable and efficient RDF triple store. ISWC
Papailiou N, Konstantinou L, Tsoumakos D, Korizirs N (2012) H2RDF: adaptive query processing on RDF data in the cloud. WWW
Hbase. http://hbase.apache.org/
Jena. https://jena.apache.org/
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. OSDI, 2004, pp 137–150
Casandra. http://cassandra.apache.org/
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach Deborah A, Burrows M, Chandra T, Fikes A, Gruber RE (2006) Bigtable: a distributed storage system for structured data. OSDI
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. SOSP
Stoica I, Morris R, Karger D, Frans Kaashoek M, Balakrishnan H (2001) Chord: a scalable peer-to-peer lookup service for internet applications. SIGCOMM
Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. The 1st ACM symposium on cloud computing, pp 143–154
Sesame. http://www.openrdf.org
Bizer C, Schultz A (2009) The Berlin SPARQL benchmark. Int J Semant Web Inform Syst 5(2):1–24
Article Google Scholar
Ladwig G, Harth A (2011) CumulusRDF: linked data management on nested key-value stores. ISWC
Snappy compression codec. http://code.google.com/p/snappy/
SPARQL 1.1. http://www.w3.org/TR/sparql11-query/
LUBM. http://swat.cse.lehigh.edu/projects/lubm/
Cao B, Yin J, Zhang Q, Ye Y (2010) A MapReduce-based architecture for rule matching in production system. CloudCom

Download references

Acknowledgments

This work was supported by the IT R&D program of MSIP/IITP. [B010-15-0353, High performance database solution development for Integrated big data monitoring and Analytics].

Author information

Authors and Affiliations

Korea Institute of Science and Technology Information, 245 Daehangno, Yuseong-gu, Taejon, 305-806, Korea
Jung-Ho Um, Seungwoo Lee, Tae-Hong Kim, Chang-Hoo Jeong, Sa-Kwang Song & Hanmin Jung

Authors

Jung-Ho Um
View author publications
You can also search for this author inPubMed Google Scholar
Seungwoo Lee
View author publications
You can also search for this author inPubMed Google Scholar
Tae-Hong Kim
View author publications
You can also search for this author inPubMed Google Scholar
Chang-Hoo Jeong
View author publications
You can also search for this author inPubMed Google Scholar
Sa-Kwang Song
View author publications
You can also search for this author inPubMed Google Scholar
Hanmin Jung
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Chang-Hoo Jeong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Um, JH., Lee, S., Kim, TH. et al. Distributed RDF store for efficient searching billions of triples based on Hadoop. J Supercomput 72, 1825–1840 (2016). https://doi.org/10.1007/s11227-016-1670-6

Download citation

Published: 16 February 2016
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11227-016-1670-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed RDF store for efficient searching billions of triples based on Hadoop

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on RDF Data Store Based on NoSQL Systems for the Semantic Web Applications

Universal Storage Adaption for Distributed RDF-Triple Stores

A Distributed RDF Storage and Query Model Based on HBase

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now