Abstract
Now we are living in an interconnected world and the amount of heterogeneous information data such as RDF is continually increasing. A lot has been done to find the solution to manage huge amount of RDF data. The solutions based on RDBMS have significant scalability issues considering the magnitude of data in modern time. In this paper we describe our solution to store and query RDF data in the cloud based on HBase and MapReduce. A vertical-partitioning-like model is used in HBase to reduce the table size and to obtain a good performance of SPARQL query. For complex query on large data, we propose to use cascading MapReduce job on HBase to enhance efficiency. Our experiments on LUBM show that our system can store large RDF graphs and can obtain good query efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Apache Jena, http://jena.apache.org.
- 2.
RDF Current Status, http://www.w3.org/standards/techs/rdf#w3c_all.
- 3.
SPARQL 1.1 Query Language, http://www.w3.org/TR/sparql11-query/.
- 4.
Accumulo, http://accumulo.apache.org/.
References
Yuanzhuo, W., Yantao, J., Dawei, L., Xiaolong, J., Xueqi, C.: Open web knowledge aided information search and data mining. J. Comput. Res. Dev. 52(2), 456–474 (2015)
Du, F., Chen, Y.G., Du, X.Y.: Survey of RDF query processing techniques. Ruan Jian Xue Bao/J. Softw. 24(6), 1222–1242 (2013)
Franke, C., Morin, S., Chebotko, A., Abraham, J., Brazier, P.: Distributed semantic web data management in HBase and MySQL cluster. In: IEEE International Conference on Cloud Computing (CLOUD), 2011, pp. 105–112. IEEE, July 2011
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422. VLDB Endowment, September 2007
Melnik, S.: Storing RDF in a relational database (2001)
Wilkinson, K., Wilkinson, K.: Jena property table implementation (2006)
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB, vol. 3, pp. 131–150, September 2003
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for Semantic Web data management. VLDB J.—Int. J. Very Large Data Bases 18(2), 385–406 (2009)
Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 1–25 (2014)
Husain, M.F., Doshi, P., Khan, L., Thuraisingham, B.: Storage and retrieval of large RDF graph using Hadoop and MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 680–686. Springer, Heidelberg (2009)
Dittrich, J., Quiané-Ruiz, J.A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop ++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3(1–2), 515–529 (2010)
Dittrich, J., Quiané-Ruiz, J.A., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only aggressive elephants are fast elephants. Proc. VLDB Endow. 5(11), 1591–1602 (2012)
Choi, H., Son, J., Cho, Y., Sung, M.K., Chung, Y.D.: SPIDER: a system for scalable, parallel/distributed evaluation of large-scale RDF data. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2087–2088. ACM, November 2009
Sun, J., Jin, Q.: Scalable rdf store based on hbase and mapreduce. In: 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 1, pp. V1–633. IEEE, August 2010
Abraham, J., Brazier, P., Chebotko, A., Navarro, J., Piazza, A.: Distributed storage and querying techniques for a semantic web of scientific workflow provenance. In: IEEE International Conference on Services Computing (SCC), 2010, pp. 178–185. IEEE, July 2010
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. Web Semant. Sci. Serv. Agents World Wide Web 3(2), 158–182 (2005)
Punnoose, R., Crainiceanu, A., Rapp, D.: SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)
Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB, pp. 1216–1227, August 2005
Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 121–132. ACM, 2013 June
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)
Acknowledgments
This work is supported in part by the National Key Basic Research and Department (973) Program of China (No. 2013CB329606), and the Co-construction Project of Beijing Municipal Commission of Education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, K., Wu, B., Wang, B. (2015). A Distributed RDF Storage and Query Model Based on HBase. In: Xiao, X., Zhang, Z. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9391. Springer, Cham. https://doi.org/10.1007/978-3-319-23531-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-23531-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23530-1
Online ISBN: 978-3-319-23531-8
eBook Packages: Computer ScienceComputer Science (R0)