Distributed RDFS Reasoning with MapReduce

Cetin, Yigit; Abul, Osman

doi:10.1007/978-3-319-09465-6_32

Yigit Cetin⁴ &
Osman Abul⁴

815 Accesses

Abstract

We live in big data age in which many computational tasks either generate or need to use large datasets. This makes parallel and distributed computing a key for scalability. MapReduce is a programming model for processing large datasets in parallel and distributed fashion on cluster of computers. Today, since the size and complexity of RDFS documents increase rapidly, RDFS reasoning problem has to embrace and address the big data solutions. The output of RDFS reasoning job can be input to another job and the output of RDFS reasoning jobs grow big as the input documents gets bigger. In this study, an indexing method is proposed to speed up the RDFS reasoning over Hadoop clusters. We also explore the utility of caching and Hadoop ecosystem tools Apache Hive and Apache Pig for this task. Experimental evaluations on Dbpedia and Freebase datasets show that the indexing method is quite effective and offers scalable solutions. Performance of caching and Apache Hive is found acceptable too.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

T. Berners-Lee, J. Hendler, O. Lassila, The semantic web. Sci. Am. 284(5), 28–37 (2001)
Article Google Scholar
J. Weaver, J.A. Hendler, Parallel materialization of the finite rdfs closure for hundreds of millions of triples. in: Proceedings of the 8th International Semantic Web Conference (ISWC 2009), pp. 682–697, Springer (2009)
Google Scholar
M. Husain, L. Khan, M. Kantarcioglu, B. Thuraisingham, Data intensive query processing for large RDF graphs using cloud computing tools. in: Proceedings of the IEEE 3rd International Conference on Cloud Computing (CLOUD 2010), pp. 1–10, (2010)
Google Scholar
Apache Hadoop, http://hadoop.apache.org/. Accessed April 2014
S.G.J. Dean,Mapreduce: simplified data processing on large clusters. in 6th Symposium on Operating Systems Design and Implementation (OSDI 2004), (2004)
Google Scholar
Apache Hive, http://hive.apache.org/. Accessed April 2014
Apache Pig, http://pig.apache.org/. Accessed April 2014
A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, R. Murthy, Hive-a petabyte scale data warehouse using Hadoop. in: Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 996–1005, (2010)
Google Scholar
P. Papailiou, I. Konstantinou, D. Tsoumakos, P. Karras, N. Koziris, H2RDF+: High-performance distributed joins over large-scale RDF graphs. in: Proceedings of the IEEE International Conference on Big Data, pp. 255–263, (2013)
Google Scholar
S. Jianling, J. Qiang, Scalable RDF Store Based on HBase and MapReduce. in: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE 2010), pp. 633–636, (2010)
Google Scholar
D. Brickley, R.V. Guha (eds.), RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation, (2004)
Google Scholar
J. Urbani, S. Kotoulas, E. Oren, F. Van Harmelen, Scalable distributed reasoning using mapreduce. in: Proceedings of the 8th International Semantic Web Conference (ISWC 2009), pp. 634–649, Springer (2009)
Google Scholar
T. White, Hadoop The Definitive Guide (O’Reilly Media/Yahoo Press, Sebastopol, 2012)
Google Scholar
Y. Zhanga, T. Chenb, W. Youc, J. Yud, J. Sune, H. Chenf, A new efficient semantic web platform based on the Solr, SIREn and RDF. in: Proceedings of the International Conference on Information Engineering (2012)
Google Scholar
Apache Solr, http://lucene.apache.org/solr/. Accessed April 2014

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey
Yigit Cetin & Osman Abul

Authors

Yigit Cetin
View author publications
You can also search for this author in PubMed Google Scholar
Osman Abul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osman Abul .

Editor information

Editors and Affiliations

Polish Academy of Sciences, Gliwice, Poland
Tadeusz Czachórski
Imperial College London, London, United Kingdom
Erol Gelenbe
Imperial College London, London, United Kingdom
Ricardo Lent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cetin, Y., Abul, O. (2014). Distributed RDFS Reasoning with MapReduce. In: Czachórski, T., Gelenbe, E., Lent, R. (eds) Information Sciences and Systems 2014. Springer, Cham. https://doi.org/10.1007/978-3-319-09465-6_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-09465-6_32
Published: 25 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09464-9
Online ISBN: 978-3-319-09465-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics