skip to main content
10.1145/2484712.2484713acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Large-scale bisimulation of RDF graphs

Published: 23 June 2013 Publication History

Abstract

RDF datasets with billions of triples are no longer unusual and continue to grow constantly (e.g. LOD cloud) driven by the inherent flexibility of RDF that allows to represent very diverse datasets, ranging from highly structured to unstructured data. Because of their size, understanding and processing RDF graphs is often a difficult task and methods to reduce the size while keeping as much of its structural information become attractive. In this paper we study bisimulation as a means to reduce the size of RDF graphs according to structural equivalence. We study two bisimulation algorithms, one for sequential execution using SQL and one for distributed execution using MapReduce. We demonstrate that the MapReduce-based implementation scales linearly with the number of the RDF triples, allowing to compute the bisimulation of very large RDF graphs within a time which is by far not possible for the sequential version. Experiments based on synthetic benchmark data and real data (DBPedia) exhibit a reduction of more than 90% of the size of the RDF graph in terms of the number of nodes to the number of blocks in the resulting bisimulation partition.

References

[1]
The Linking Open Data cloud diagram. http://richard.cyganiak.de/2007/10/lod/.
[2]
RDF Specification Overview (w3c). http://www.w3.org/standards/techs/rdf.
[3]
Social Network Intelligence BenchMark. http://www.w3.org/wiki/Social_Network_Intelligence_BenchMark.
[4]
A. Alzoghbi and G. Lausen. Similar Structures inside RDF Graphs. In LDOW, 2013, to appear.
[5]
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia - A crystallization point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3):154--165, 2009.
[6]
S. Blom and S. Orzan. A Distributed Algorithm for Strong Bisimulation Reduction of State Spaces. STTT, 7(1):74--86, 2005.
[7]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008.
[8]
A. Dovier, C. Piazza, and A. Policriti. An efficient algorithm for computing bisimulation equivalence. Theoretical Computer Science, 311:221--256, 2004.
[9]
S. Duan, A. Kementsietsidis, K. Srinivas, and O. Udrea. Apples and Oranges: A Comparison of RDF Benchmarks and Real RDF Datasets. In SIGMOD, pages 145--156, 2011.
[10]
W. Fan, J. Li, X. Wang, and Y. Wu. Query Preserving Graph Compression. In SIGMOD, pages 157--168, 2012.
[11]
Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. J. Web Sem., 3(2-3):158--182, 2005.
[12]
J. Hellings, G. H. L. Fletcher, and H. J. Haverkort. Efficient External-Memory Bisimulation on DAGs. In SIGMOD Conference, 2012.
[13]
P. C. Kanellakis and S. A. Smolka. CCS Expressions, Finite State Processes, and Three Problems of Equivalence. In PODC, 1983.
[14]
R. Kaushik, P. Bohannon, J. F. Naughton, and H. F. Korth. Covering indexes for branching path queries. In SIGMOD Conference, 2002.
[15]
R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for indexing paths in graph-structured data. In ICDE, 2002.
[16]
S. Khatchadourian and M. P. Consens. ExpLOD: Summary-Based Exploration of Interlinking and RDF Usage in the Linked Open Data Cloud. In ESWC (2), 2010.
[17]
M. Konrath, T. Gottron, S. Staab, and A. Scherp. SchemEX - Efficient Construction of a Data Catalogue by Stream-Based Indexing of Linked Data. Journal of Web Semantics, 16(5), 2012.
[18]
Y. Luo, Y. de Lange, G. H. L. Fletcher, P. De Bra, J. Hidders, and Y. Wu. Bisimulation reduction of Big Graphs on Mapreduce. In BNCOD, 2013, to appear.
[19]
R. Milner. Communication and concurrency. PHI Series in computer science. Prentice Hall, 1989.
[20]
T. Milo and D. Suciu. Index structures for path expressions. In ICDT, 1999.
[21]
S. Nestorov, S. Abiteboul, and R. Motwani. Extracting Schema from Semistructured Data. In SIGMOD Conference, 1998.
[22]
C. Qun, A. Lim, and K. W. Ong. D(k)-index: An adaptive structural summary for graph-structured data. In SIGMOD Conference, 2003.
[23]
D. Sangiorgi. On the Origins of Bisimulation and Coinduction. ACM Trans. Program. Lang. Syst., 31(4):15:1--15:41, 2009.
[24]
M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel. SP2Bench: A SPARQL Performance Benchmark. In ICDE, pages 222--233, 2009.
[25]
T. Tran, G. Ladwig, and S. Rudolph. RDF Data Partitioning and Query Processing Using Structure Indexes. TKDE, 99, 2012. to appear.

Cited By

View all
  • (2024)Instance-Based Lossless Summarization of Knowledge Graph With Optimized Triples and Corrections (IBA-OTC)IEEE Access10.1109/ACCESS.2023.334098412(5584-5604)Online publication date: 2024
  • (2024)HERSE: Handling and Enhancing RDF Summarization Through Blank Node EliminationFoundations of Intelligent Systems10.1007/978-3-031-62700-2_9(87-101)Online publication date: 17-Jun-2024
  • (2023)A Novel Approach for Extracting Summarized RDF Graph from Heterogeneous Corpus2023 International Conference on Innovations in Intelligent Systems and Applications (INISTA)10.1109/INISTA59065.2023.10310645(1-7)Online publication date: 20-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SWIM '13: Proceedings of the Fifth Workshop on Semantic Web Information Management
June 2013
50 pages
ISBN:9781450321945
DOI:10.1145/2484712
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MapReduce
  2. RDF
  3. bisimulation
  4. semantic web

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'13
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Instance-Based Lossless Summarization of Knowledge Graph With Optimized Triples and Corrections (IBA-OTC)IEEE Access10.1109/ACCESS.2023.334098412(5584-5604)Online publication date: 2024
  • (2024)HERSE: Handling and Enhancing RDF Summarization Through Blank Node EliminationFoundations of Intelligent Systems10.1007/978-3-031-62700-2_9(87-101)Online publication date: 17-Jun-2024
  • (2023)A Novel Approach for Extracting Summarized RDF Graph from Heterogeneous Corpus2023 International Conference on Innovations in Intelligent Systems and Applications (INISTA)10.1109/INISTA59065.2023.10310645(1-7)Online publication date: 20-Sep-2023
  • (2023)Computing k-Bisimulations for Large Graphs: A Comparison and Efficiency AnalysisGraph Transformation10.1007/978-3-031-36709-0_12(223-242)Online publication date: 12-Jul-2023
  • (2021)A Framework to Quantify Approximate Simulation on Graph Data2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00117(1308-1319)Online publication date: Apr-2021
  • (2021)FLUID: A common model for semantic structural graph summaries based on equivalence relationsTheoretical Computer Science10.1016/j.tcs.2020.12.019854(136-158)Online publication date: Jan-2021
  • (2021)A survey on semantic schema discoveryThe VLDB Journal10.1007/s00778-021-00717-x31:4(675-710)Online publication date: 27-Nov-2021
  • (2021)ABSTAT-HD: a scalable tool for profiling very large knowledge graphsThe VLDB Journal10.1007/s00778-021-00704-231:5(851-876)Online publication date: 29-Sep-2021
  • (2020)RDF graph summarization for first-sight structure discoveryThe VLDB Journal10.1007/s00778-020-00611-y29:5(1191-1218)Online publication date: 30-Apr-2020
  • (2019)Quality metrics for RDF graph summarizationSemantic Web10.3233/SW-19034610:3(555-584)Online publication date: 1-Jan-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media