skip to main content
10.1145/3555041.3589723acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes

Published:05 June 2023Publication History

ABSTRACT

We demonstrate SHACTOR, a system for extracting and analyzing validating shapes from very large Knowledge Graphs (KGs). Shapes represent a specific form of data patterns, akin to schemas for entities. Standard shape extraction approaches are likely to produce thousands of shapes, and some of those represent spurious constraints extracted due to the presence of erroneous data in the KG. Given a KG having tens of millions of triples and thousands of classes, SHACTOR parses the KG using our efficient and scalable shapes extraction algorithm and outputs SHACL shapes constraints. The extracted shapes are further annotated with statistical information regarding their support in the graph, which allows to identify both erroneous and missing triples in the KG. Hence, SHACTOR can be used to extract, analyze, and clean shape constraints from very large KGs. Furthermore, it enables the user to also find and correct errors by automatically generating SPARQL queries over the graph to retrieve nodes and facts that are the source of the spurious shapes and to intervene by amending the data.

Skip Supplemental Material Section

Supplemental Material

SIGMOD23-demo43.mp4

mp4

611.7 MB

References

  1. WWW Consortium. 2014. RDF 1.1. https://w3.org/RDF/.Google ScholarGoogle Scholar
  2. D. Fernandez-Álvarez, J. Emilio Labra-Gayo, and D. Gayo-Avello. 2022. Automatic extraction of shapes using sheXer. KBS, Vol. 238 (2022), 107975.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Keely. 2022. SHACLGEN. https://pypi.org/project/shaclgen/.Google ScholarGoogle Scholar
  4. Holger Knublauch and Dimitris Kontokostas. 2017. Shapes constraint language (SHACL). W3C Candidate Recommendation, Vol. 11, 8 (2017).Google ScholarGoogle Scholar
  5. N. Noy, Y. Gao, A. Jain, A. Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-scale knowledge graphs: lessons and challenges. ACM, Vol. 62, 8 (2019), 36--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Prud'hommeaux, J. Emilio Labra Gayo, and H. R. Solbrig. 2014. Shape expressions: an RDF validation and transformation language. In ICSS. 32--40.Google ScholarGoogle Scholar
  7. K. Rabbani, M. Lissandrini, and K. Hose. 2022. SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption. In TheWebConf-2022. 260--263.Google ScholarGoogle Scholar
  8. K. Rabbani, M. Lissandrini, and K. Hose. 2023. Extraction of Validating Shapes from very large Knowledge Graphs. PVLDB, Vol. 16, 5 (2023), 1023--1032.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Sequeda and O. Lassila. 2021. Designing and Building Enterprise Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Vol. 11, 1 (2021), 1--165.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '23: Companion of the 2023 International Conference on Management of Data
          June 2023
          330 pages
          ISBN:9781450395076
          DOI:10.1145/3555041

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 June 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader