Abstract
We live in big data age in which many computational tasks either generate or need to use large datasets. This makes parallel and distributed computing a key for scalability. MapReduce is a programming model for processing large datasets in parallel and distributed fashion on cluster of computers. Today, since the size and complexity of RDFS documents increase rapidly, RDFS reasoning problem has to embrace and address the big data solutions. The output of RDFS reasoning job can be input to another job and the output of RDFS reasoning jobs grow big as the input documents gets bigger. In this study, an indexing method is proposed to speed up the RDFS reasoning over Hadoop clusters. We also explore the utility of caching and Hadoop ecosystem tools Apache Hive and Apache Pig for this task. Experimental evaluations on Dbpedia and Freebase datasets show that the indexing method is quite effective and offers scalable solutions. Performance of caching and Apache Hive is found acceptable too.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
T. Berners-Lee, J. Hendler, O. Lassila, The semantic web. Sci. Am. 284(5), 28–37 (2001)
J. Weaver, J.A. Hendler, Parallel materialization of the finite rdfs closure for hundreds of millions of triples. in: Proceedings of the 8th International Semantic Web Conference (ISWC 2009), pp. 682–697, Springer (2009)
M. Husain, L. Khan, M. Kantarcioglu, B. Thuraisingham, Data intensive query processing for large RDF graphs using cloud computing tools. in: Proceedings of the IEEE 3rd International Conference on Cloud Computing (CLOUD 2010), pp. 1–10, (2010)
Apache Hadoop, http://hadoop.apache.org/. Accessed April 2014
S.G.J. Dean,Mapreduce: simplified data processing on large clusters. in 6th Symposium on Operating Systems Design and Implementation (OSDI 2004), (2004)
Apache Hive, http://hive.apache.org/. Accessed April 2014
Apache Pig, http://pig.apache.org/. Accessed April 2014
A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, R. Murthy, Hive-a petabyte scale data warehouse using Hadoop. in: Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 996–1005, (2010)
P. Papailiou, I. Konstantinou, D. Tsoumakos, P. Karras, N. Koziris, H2RDF+: High-performance distributed joins over large-scale RDF graphs. in: Proceedings of the IEEE International Conference on Big Data, pp. 255–263, (2013)
S. Jianling, J. Qiang, Scalable RDF Store Based on HBase and MapReduce. in: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE 2010), pp. 633–636, (2010)
D. Brickley, R.V. Guha (eds.), RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation, (2004)
J. Urbani, S. Kotoulas, E. Oren, F. Van Harmelen, Scalable distributed reasoning using mapreduce. in: Proceedings of the 8th International Semantic Web Conference (ISWC 2009), pp. 634–649, Springer (2009)
T. White, Hadoop The Definitive Guide (O’Reilly Media/Yahoo Press, Sebastopol, 2012)
Y. Zhanga, T. Chenb, W. Youc, J. Yud, J. Sune, H. Chenf, A new efficient semantic web platform based on the Solr, SIREn and RDF. in: Proceedings of the International Conference on Information Engineering (2012)
Apache Solr, http://lucene.apache.org/solr/. Accessed April 2014
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Cetin, Y., Abul, O. (2014). Distributed RDFS Reasoning with MapReduce. In: Czachórski, T., Gelenbe, E., Lent, R. (eds) Information Sciences and Systems 2014. Springer, Cham. https://doi.org/10.1007/978-3-319-09465-6_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-09465-6_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09464-9
Online ISBN: 978-3-319-09465-6
eBook Packages: Computer ScienceComputer Science (R0)