Skip to main content

TripleID: A Low-Overhead Representation and Querying Using GPU for Large RDFs

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 613))

Abstract

Resource Description Framework (RDF) is a commonly used format for semantic web processing. It basically contains strings representing terms and their relationships which can be queried or inferred. RDF is usually a large text file which contains many million relationships. In this work, we propose a framework, TripleID, for processing queries of large RDF data. The framework utilises Graphics Processing Units (GPUs) to search RDF relations. The RDF data is first transformed to the encoded form suitable for storing in the GPU memory. Then parallel threads on the GPU search the required data. We show in the experiments that one GPU on a personal desktop can handle 100 million triple relations, while a traditional RDF processing tool can process up to 10 million triples. Furthermore, we can query sample relations within 0.18 s with the GPU in 7 million triples, while the traditional tool takes at least 6 s for 1.8 million triples.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. RDF Vocabulary Description Language 1.0: RDF Schema. http://www.w3.org/TR/2004/REC-rdf-schema-20040210/#ch_type

  2. ref.sh. (2015). https://github.com/seebi/rdf.sh. Retrieved Nov 2015

  3. Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: A scalable lightweight join query processor for RDF data. In: Proceedings of the 19th International Conference on World Wide Web WWW 2010, pp. 41–50. ACM, New York (2010)

    Google Scholar 

  4. Atre, M., Hendler, J.A.: BitMat: A main memory bit-matrix of RDF triples. In: Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (2009)

    Google Scholar 

  5. Beckett, D.: The design and implementation of the Redland librdf RDF API Library. In: Proceedings of WWW10, Hong Kong, May 2001

    Google Scholar 

  6. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web: Scientific American (2001). citeulike-article-id:1176986 http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&pageNumber=1&catID=2

  7. Bizer, C., Lehmann, J., Kobilarov, G., Auer, R., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the Web of Data. Web Semant. 7(3), 154–165 (2009)

    Article  Google Scholar 

  8. Choksuchat, C., Chantrapornchai, C.: Large RDF representation framework for GPUs case study key-value storage and binary triple pattern. In: International Computer Science and Engineering Conference (ICSEC), pp. 13–18, September 2013

    Google Scholar 

  9. Choksuchat, C., Chantrapornchai, C., Haidl, M., Gorlatch, S.: Accelerating keyword search for big RDF web data on many-core systems. In: Fujita, H., Guizzi, G. (eds.) SoMeT 2015. CCIS, vol. 532, pp. 190–202. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  10. Grant, C.K., Lee, F., Torres, E.: SPARQL Protocol for RDF.W3c Recommendation (2008). http://www.w3.org/TR/rdf-sparql-protocol/

  11. Groppe, J., Groppe, S.: Parallelizing join computations of SPARQL queries for large semantic web databases. In: Proceedings of the 2011 ACM Symposium on Applied Computing SAC 2011. pp. 1681–1686. ACM, New York (2011). http://doi.acm.org/10.1145/1982185.1982536

  12. He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data SIGMOD 2008, pp. 511–524. ACM, New York (2008). http://doi.acm.org/10.1145/1376616.1376670

  13. Heino, N., Pan, J.Z.: RDFs reasoning on massively parallel hardware. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 133–148. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Kim, J., Kim, S.G., Nam, B.: Parallel multi-dimensional range query processing with R-trees on GPU. J. Parallel Distrib. Comput. 73(8), 1195–1207 (2013)

    Article  Google Scholar 

  15. Kim, Y., Lee, Y., Lee, J.: An efficient approach to triple search and join of HDT processing using GPU. In: Proceedings of The Seventh International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA), pp. 70–74. IARIA (2015)

    Google Scholar 

  16. Liu, C., Urbani, J., Qi, G.: Efficient RDF stream reasoning with graphics processing units (GPUs). In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion Steering Committee, Republic and Canton of Geneva, Switzerland, pp. 343–344. WWW Companion 2014, International World Wide Web Conferences (2014)

    Google Scholar 

  17. Madduri, K., Wu, K.: Massive-scale RDF processing using compressed bitmap indexes. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 470–479. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22351-8_30

    Chapter  Google Scholar 

  18. Makni, B.: Optimizing RDF stores by coupling general-purpose graphics processing units and central processing units. In: Proceedings of ISWC (2013). http://ceur-ws.org/Vol-1045/paper-06.pdf

  19. Nam, B., Sussman, A.: Analyzing design choices for distributed multidimensional indexing. J. Supercomputing 59(3), 1552–1576 (2012). doi:10.1007/s11227-011-0567-7

    Article  Google Scholar 

  20. NIVIDIA: An introduction to CUDA-Aware MPI. (2013). http://devblogs.nvidia.com/parallelforall/introduction-cuda-aware-mpi/. Retrieved July 2015

  21. NVIDIA: NVIDIA GPU programming guide (2015). https://developer.nvidia.com/nvidia-gpu-programming-guide. Retrieved July 2015

  22. Schmidt, M., Hornung, T., Meier, M., Pinkel, C., Lausen, G.: SP2Bench: A SPARQL performance benchmark. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management, pp. 371–393. Springer, Heidelberg (2010). doi:10.1007/978-3-642-04329-1_16

    Chapter  Google Scholar 

  23. Teams, R.: rdflib 4.2.1. (2015). http://rdflib.readthedocs.org/. Retrieved November 2015

  24. W3C.: Resource description framework (2004). http://www.w3.org/RDF/. Retrieved July 2015

  25. W3C.: Virtuosouniversalserver (2009). http://www.w3.org/wiki/VirtuosoUniversalServer. Retrieved Dec 2015

  26. Wei, Z., Jaja, J.: A fast algorithm for constructing inverted files on heterogeneous platforms. In: 2011 IEEE International Parallel Distributed Processing Symposium (IPDPS), pp. 1124–1134, May 2011

    Google Scholar 

  27. Wei, Z., JaJa, J.: A fast algorithm for constructing inverted files on heterogeneous platforms. J. Parallel Distrib. Comput. 72(5), 728–738 (2012). doi:10.1016/j.jpdc.2012.02.005

    Article  Google Scholar 

  28. Weiss, C., Karras, P.J.D., Martínez-Prieto, M.A., Bernstein, A.: Hexastore: Sextuple indexing for semantic web data management. In: Proceedings of PVLDB, pp. 1008–1019. ACM (2008). http://www.vldb.org/pvldb/1/1453965.pdf

    Google Scholar 

  29. zlib.: zlib usage example (2012). http://www.zlib.net/. Retrieved Nov 2015

Download references

Acknowledgement

This work was supported in part by the following institutes and research programs: The Thailand Research Fund (TRF) through the Royal Golden Jubilee Ph.D. Program under Grant PHD/0005/2554, DAAD (German Academic Exchange Service) Scholarship project id: 57084841, NVIDIA Hardware grant, and the Faculty of Engineering at Kasetsart University Research funding contract no. 57/12/MATE.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chantana Chantrapornchai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Chantrapornchai, C., Choksuchat, C., Haidl, M., Gorlatch, S. (2016). TripleID: A Low-Overhead Representation and Querying Using GPU for Large RDFs. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34099-9_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34098-2

  • Online ISBN: 978-3-319-34099-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics