Skip to main content

Using Pregel to Create Knowledge Graphs Subsets Described by Non-recursive Shape Expressions

  • Conference paper
  • First Online:
Knowledge Graphs and Semantic Web (KGSWC 2023)

Abstract

Knowledge Graphs have been successfully adopted in recent years, existing general-purpose ones, like Wikidata, as well as domain-specific ones, like UniProt. Their increasing size poses new challenges to their practical usage. As an example, Wikidata has been growing the size of its contents and their data since its inception making it difficult to download and process its data. Although the structure of Wikidata items is flexible, it tends to be heterogeneous: the shape of an entity representing a human is distinct from that of a mountain. Recently, Wikidata adopted Entity Schemas to facilitate the definition of different schemas using Shape Expressions, a language that can be used to describe and validate RDF data. In this paper, we present an approach to obtain subsets of knowledge graphs based on Shape Expressions that use an implementation of the Pregel algorithm implemented in Rust. We have applied our approach to obtain subsets of Wikidata and UniProt and present some of these experiments’ results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.w3.org/TR/2017/REC-shacl-20170720/.

  2. 2.

    https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas/Subsetting.

  3. 3.

    https://github.com/elixir-europe/BioHackathon-projects-2020/tree/master/projects/35.

  4. 4.

    https://rdfshape.weso.es/link/16902825958.

  5. 5.

    https://github.com/weso/pregel-rs.

  6. 6.

    https://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/.

  7. 7.

    https://github.com/angelip2303/pschema-rs/tree/main/examples/from_uniprot.

References

  1. Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 967–980. Association for Computing Machinery, New York (2008). https://doi.org/10.1145/1376616.1376712

  2. Beghaeiraveri, S.A.H., et al.: Wikidata subsetting: approaches, tools, and evaluation (2023). https://www.semantic-web-journal.net/system/files/swj3491.pdf

  3. The UniProt Consortium: UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51(D1), D523–D531 (2022). https://doi.org/10.1093/nar/gkac1052

  4. Gayo, J.E.L.: Creating knowledge graphs subsets using shape expressions (2021). https://doi.org/10.z8550/ARXIV.2110.11709. https://arxiv.org/abs/2110.11709

  5. Gayo, J.E.L.: Wshex: a language to describe and validate wikibase entities (2022). https://arxiv.org/abs/2208.02697

  6. Hogan, A., et al.: Knowledge graphs. CoRR abs/2003.02320 (2020). https://arxiv.org/abs/2003.02320

  7. Labra-Gayo, J.E., et al.: Knowledge graphs and wikidata subsetting (2021). https://doi.org/10.37044/osf.io/wu9et. http://biohackrxiv.org/wu9et

  8. Labra-Gayo, J.E., et al.: RDF Data integration using Shape Expressions (2023). https://biohackrxiv.org/md73k

  9. Labra Gayo, J.E., Prud’hommeaux, E., Boneva, I., Kontokostas, D.: Validating RDF Data. No. 1 in Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan & Claypool Publishers LLC (2017). https://doi.org/10.2200/s00786ed1v01y201707wbe016

  10. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 International Conference on Management of Data, New York, NY, USA, pp. 135–146 (2010). https://doi.org/10.1145/1807167.1807184

  11. McCune, R.R., Weninger, T., Madey, G.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48(2) (2015). https://doi.org/10.1145/2818185

  12. Prud’hommeaux, E., Labra Gayo, J.E., Solbrig, H.: Shape expressions: an RDF validation and transformation language. In: Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS 2014, pp. 32–40. ACM (2014)

    Google Scholar 

  13. Reutter, J.L., Soto, A., Vrgoč, D.: Recursion in SPARQL. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 19–35. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_2

    Chapter  Google Scholar 

  14. Thornton, K., Solbrig, H., Stupp, G.S., Labra Gayo, J.E., Mietchen, D., Prud’hommeaux, E., Waagmeester, A.: Using shape expressions (ShEx) to share RDF data models and to guide curation with rigorous validation. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 606–620. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_39

    Chapter  Google Scholar 

  15. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Series in Multimedia Information and Systems. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  16. Xu, Q., Wang, X., Li, J., Zhang, Q., Chai, L.: Distributed subgraph matching on big knowledge graphs using pregel. IEEE Access 7, 116453–116464 (2019). https://doi.org/10.1109/ACCESS.2019.2936465

    Article  Google Scholar 

Download references

Acknowledgements

This project has received funding from NumFOCUS, a non-profit organization promoting open-source scientific projects, and has been supported by the ANGLIRU project, funded by the Spanish Agency for Research. The opinions and arguments employed herein do not reflect the official views of these organizations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ángel Iglesias Préstamo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Préstamo, Á.I., Gayo, J.E.L. (2023). Using Pregel to Create Knowledge Graphs Subsets Described by Non-recursive Shape Expressions. In: Ortiz-Rodriguez, F., Villazón-Terrazas, B., Tiwari, S., Bobed, C. (eds) Knowledge Graphs and Semantic Web. KGSWC 2023. Lecture Notes in Computer Science, vol 14382. Springer, Cham. https://doi.org/10.1007/978-3-031-47745-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47745-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47744-7

  • Online ISBN: 978-3-031-47745-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics