Skip to main content

A Distributed RDF Storage and Query Model Based on HBase

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9391))

Abstract

Now we are living in an interconnected world and the amount of heterogeneous information data such as RDF is continually increasing. A lot has been done to find the solution to manage huge amount of RDF data. The solutions based on RDBMS have significant scalability issues considering the magnitude of data in modern time. In this paper we describe our solution to store and query RDF data in the cloud based on HBase and MapReduce. A vertical-partitioning-like model is used in HBase to reduce the table size and to obtain a good performance of SPARQL query. For complex query on large data, we propose to use cascading MapReduce job on HBase to enhance efficiency. Our experiments on LUBM show that our system can store large RDF graphs and can obtain good query efficiency.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Apache Jena, http://jena.apache.org.

  2. 2.

    RDF Current Status, http://www.w3.org/standards/techs/rdf#w3c_all.

  3. 3.

    SPARQL 1.1 Query Language, http://www.w3.org/TR/sparql11-query/.

  4. 4.

    Accumulo, http://accumulo.apache.org/.

References

  1. Yuanzhuo, W., Yantao, J., Dawei, L., Xiaolong, J., Xueqi, C.: Open web knowledge aided information search and data mining. J. Comput. Res. Dev. 52(2), 456–474 (2015)

    Google Scholar 

  2. Du, F., Chen, Y.G., Du, X.Y.: Survey of RDF query processing techniques. Ruan Jian Xue Bao/J. Softw. 24(6), 1222–1242 (2013)

    Google Scholar 

  3. Franke, C., Morin, S., Chebotko, A., Abraham, J., Brazier, P.: Distributed semantic web data management in HBase and MySQL cluster. In: IEEE International Conference on Cloud Computing (CLOUD), 2011, pp. 105–112. IEEE, July 2011

    Google Scholar 

  4. Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422. VLDB Endowment, September 2007

    Google Scholar 

  5. Melnik, S.: Storing RDF in a relational database (2001)

    Google Scholar 

  6. Wilkinson, K., Wilkinson, K.: Jena property table implementation (2006)

    Google Scholar 

  7. Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB, vol. 3, pp. 131–150, September 2003

    Google Scholar 

  8. Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for Semantic Web data management. VLDB J.—Int. J. Very Large Data Bases 18(2), 385–406 (2009)

    Article  Google Scholar 

  9. Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 1–25 (2014)

    Google Scholar 

  10. Husain, M.F., Doshi, P., Khan, L., Thuraisingham, B.: Storage and retrieval of large RDF graph using Hadoop and MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 680–686. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. Dittrich, J., Quiané-Ruiz, J.A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop ++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3(1–2), 515–529 (2010)

    Article  Google Scholar 

  12. Dittrich, J., Quiané-Ruiz, J.A., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only aggressive elephants are fast elephants. Proc. VLDB Endow. 5(11), 1591–1602 (2012)

    Article  Google Scholar 

  13. Choi, H., Son, J., Cho, Y., Sung, M.K., Chung, Y.D.: SPIDER: a system for scalable, parallel/distributed evaluation of large-scale RDF data. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2087–2088. ACM, November 2009

    Google Scholar 

  14. Sun, J., Jin, Q.: Scalable rdf store based on hbase and mapreduce. In: 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 1, pp. V1–633. IEEE, August 2010

    Google Scholar 

  15. Abraham, J., Brazier, P., Chebotko, A., Navarro, J., Piazza, A.: Distributed storage and querying techniques for a semantic web of scientific workflow provenance. In: IEEE International Conference on Services Computing (SCC), 2010, pp. 178–185. IEEE, July 2010

    Google Scholar 

  16. Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. Web Semant. Sci. Serv. Agents World Wide Web 3(2), 158–182 (2005)

    Article  Google Scholar 

  17. Punnoose, R., Crainiceanu, A., Rapp, D.: SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015)

    Article  Google Scholar 

  18. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  19. Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB, pp. 1216–1227, August 2005

    Google Scholar 

  20. Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 121–132. ACM, 2013 June

    Google Scholar 

  21. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported in part by the National Key Basic Research and Department (973) Program of China (No. 2013CB329606), and the Co-construction Project of Beijing Municipal Commission of Education.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keran Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, K., Wu, B., Wang, B. (2015). A Distributed RDF Storage and Query Model Based on HBase. In: Xiao, X., Zhang, Z. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9391. Springer, Cham. https://doi.org/10.1007/978-3-319-23531-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23531-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23530-1

  • Online ISBN: 978-3-319-23531-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics