skip to main content
10.1145/2428736.2428764acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Towards big linked data: a large-scale, distributed semantic data storage

Published:03 December 2012Publication History

ABSTRACT

In light of the challenges of effectively managing Big Data, we are witnessing a gradual shift towards the increasingly popular Linked Open Data (LOD) paradigm. LOD aims to impose a machine-readable semantic layer over structured as well as unstructured data and hence automate some data analysis tasks that are not designed for computers. The convergence of Big Data and LOD is, however, not straightforward: the semantic layer of LOD and the Big Data large scale storage do not get along easily. Meanwhile, the sheer data size envisioned by Big Data denies certain computationally expensive semantic technologies, rendering the latter much less efficient than their performance on relatively small data sets.

In this paper, we propose a mechanism allowing LOD to take advantage of existing large-scale data stores while sustaining its "semantic" nature. We demonstrate how RDF-based semantic models can be distributed across multiple storage servers and we examine how a fundamental semantic operation can be tuned to meet the requirements on distributed and parallel data processing. Our future work will focus on stress test of the platform in the magnitude of tens of billions of triples, as well as comparative studies in usability and performance against similar offerings.

References

  1. TEDTalks: Hans Rosling: Asia's rise -- how and when - Hans Rosling (2009). TEDTalks (video), 2009.Google ScholarGoogle Scholar
  2. R. Angles and C. Gutierrez. Survey of graph database models. ACM Comput. Surv., 40(1):1:1--1:39, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, editors. The description logic handbook: Theory, Implementation and Applications. Cambridge University Press, 2003. ISBN 0-521-78176-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. Int. J. Semantic Web Inf. Syst., 5(3):1--22, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  5. D. G. Brizan and A. U. Tansel. A survey of entity resolution and record linkage methodologies. Communications of the IIMA, 6(3):41--50, 2006.Google ScholarGoogle Scholar
  6. M. Cai and M. Frank. RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network. In Proceedings of the 13th international conference on World Wide Web, WWW '04, pages 650--657, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. Jena: implementing the semantic web recommendations. In WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 74--83. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Cattell. Scalable SQL and NoSQL data stores. SIGMOD Rec., 39(4):12--27, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7, OSDI '06, pages 15--15. USENIX Association, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Rec., 26(1):65--74, Mar. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Cohen, P. Ravikumar, and S. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWeb, pages 73--78, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. M. Connolly and C. Begg. Database Systems: A Practical Approach to Design, Implementation, and Management. Addison-Wesley Longman Publishing Co., Inc., 3rd edition, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Dau. Semantic technologies for enterprises. Technical report, SAP AG, April 2011.Google ScholarGoogle Scholar
  14. D. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Commun. ACM, 35(6):85--98, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Euzenat, C. Meilicke, H. Stuckenschmidt, P. Shvaiko, and C. T. dos Santos. Ontology alignment evaluation initiative: six years of experience. Journal of Data Semantics, 15:158--192, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Gruber. Ontology. In L. Liu and M. T. Özsu, editors, Encyclopedia of Database Systems, pages 1963--1965. 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In 1st International Workshop on Practical and Scalable Semantic Systems (PSSS'03), pages 1--15, 2003.Google ScholarGoogle Scholar
  18. M. Hausenblas, R. Grossman, A. Harth, and P. Cudré-Mauroux. Large-scale linked data processing - cloud computing to the rescue? In Proceedings of the 2nd International Conference on Cloud Computing and Services Science, pages 246--251, 2012.Google ScholarGoogle Scholar
  19. A. Hogan, A. Zimmermann, J. Umbrich, A. Polleres, and S. Decker. Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora. Web Semantics, 10:76--110, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Hu and G. Svensson. A case study of linked enterprise data. In Proceedings of the 9th international semantic web conference on The semantic web, ISWC'10, pages 129--144, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, STOC '97, pages 654--663, New York, NY, USA, 1997. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Larrea, A. Fernández, and S. Arévalo. Eventually consistent failure detectors. J. Parallel Distrib. Comput., 65(3):361--373, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. O. Lassila and R. Swick. Resource Description Framework (RDF) model and syntax specification. W3C, 1999.Google ScholarGoogle Scholar
  25. J. Mondal and A. Deshpande. Managing large dynamic graphs efficiently. In Proceedings of the 2012 international conference on Management of Data, SIGMOD '12, pages 145--156. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Neumann and G. Weikum. RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow., 1(1):647--659, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Oren, S. Kotoulas, G. Anadiotis, R. Siebes, A. ten Teije, and F. van Harmelen. Marvin: Distributed reasoning over large-scale semantic web data. Web Semant., 7(4):305--316, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Papailiou, I. Konstantinou, D. Tsoumakos, and N. Koziris. H2rdf: adaptive query processing on rdf data in the cloud. In Proceedings of the 21st international conference companion on World Wide Web, WWW '12 Companion, pages 397--400. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Pritchett. BASE: an ACID alternative. Queue, 6(3):48--55, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. PrudŠhommeaux and A. Seaborne. SPARQL Query Language for RDF, 2008.Google ScholarGoogle Scholar
  31. E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal, 10(4):334--350, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Seeger. Key-Value stores: a practical overview. Media, pages 1--21, 2009.Google ScholarGoogle Scholar
  33. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST'10, pages 1--10. IEEE Computer Society, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Tenschert, M. Assel, A. Cheptsov, and G. Gallizo. Parallelization and distribution techniques for ontology matching in urban computing environments. In Proceedings of the Fourth International Workshop on Ontology Matching at the ISWC Conference, October 2009.Google ScholarGoogle Scholar
  35. G. Tsatsanifos, D. Sacharidis, and T. Sellis. On enhancing scalability for distributed rdf/s stores. In Proceedings of the 14th International Conference on Extending Database Technology, EDBT/ICDT '11, pages 141--152, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W. Vogels. Eventually consistent. Commun. ACM, 52(1):40--44, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Wasserman, K. Faust, and D. Iacobucci. Social network analysis: methods and applications (structural analysis in the social sciences). Cambridge University Press, Nov. 1994.Google ScholarGoogle Scholar
  38. T. World. Survey distributed databases, April 2012.Google ScholarGoogle Scholar

Index Terms

  1. Towards big linked data: a large-scale, distributed semantic data storage

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
            December 2012
            432 pages
            ISBN:9781450313063
            DOI:10.1145/2428736

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 3 December 2012

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader