Skip to main content

Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata Using LLMs

  • Conference paper
  • First Online:
Knowledge Engineering and Knowledge Management (EKAW 2024)

Abstract

Several initiatives have been undertaken to conceptually model the domain of scholarly data using ontologies and to create respective Knowledge Graphs. Yet, the full potential seems unleashed, as automated means for automatic population of said ontologies are lacking, and respective initiatives from the Semantic Web community are not necessarily connected: we propose to make scholarly data more sustainably accessible by leveraging Wikidata’s infrastructure and automating its population in a sustainable manner through LLMs by tapping into unstructured sources like conference Web sites and proceedings texts as well as already existing structured conference datasets. While an initial analysis shows that Semantic Web conferences are only minimally represented in Wikidata, we argue that our methodology can help to populate, evolve and maintain scholarly data as a community within Wikidata.

Our main contributions include (a) an analysis of ontologies for representing scholarly data to identify gaps and relevant entities/properties in Wikidata, (b) semi-automated extraction – requiring (minimal) manual validation – of conference metadata (e.g., acceptance rates, organizer roles, programme committee members, best paper awards, keynotes, and sponsors) from websites and proceedings texts using LLMs. Finally, we discuss (c) extensions to visualization tools in the Wikidata context for data exploration of the generated scholarly data. Our study focuses on data from 105 Semantic Web-related conferences and extends/adds more than 6000 entities in Wikidata. It is important to note that the method can be more generally applicable beyond Semantic Web-related conferences for enhancing Wikidata’s utility as a comprehensive scholarly resource. Source Repository: https://github.com/scholarly-wikidata/

 DOI: https://doi.org/10.5281/zenodo.10989709

 License: Creative Commons CC0 (Data), MIT (Code).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.ieee.org/about/at-a-glance.html.

  2. 2.

    https://www.acm.org/conferences/about-conferences.

  3. 3.

    https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.

  4. 4.

    https://openalex.org/.

  5. 5.

    https://www.wikidata.org/wiki/Wikidata:Statistics.

  6. 6.

    https://www.wikidata.org/wiki/Q119153957.

  7. 7.

    https://bit.ly/3Vs6XNc.

  8. 8.

    https://orkg.org/organizations/Event.

  9. 9.

    https://scholia.toolforge.org/.

  10. 10.

    https://synia.toolforge.org.

  11. 11.

    https://openresearch.org.

  12. 12.

    https://orkg.org/organizations/Event.

  13. 13.

    https://meta.wikimedia.org/wiki/WikiCite.

  14. 14.

    https://www.openresearch.org/wiki/ISWC.

  15. 15.

    https://github.com/lixin4ever/Conference-Acceptance-Rate.

  16. 16.

    https://www.langchain.com/langchain.

  17. 17.

    https://python.langchain.com/docs/modules/data_connection/document_transformers/semantic-chunker/.

  18. 18.

    link provided at the end of the paper.

  19. 19.

    https://blog.dblp.org/2022/03/02/dblp-in-rdf/.

  20. 20.

    http://www.scholarlydata.org/sparql/.

  21. 21.

    https://github.com/scholarly-wikidata/scholarly-wikidata/wiki/Mapping-scholarly-data-ontologies-to-Wikidata.

  22. 22.

    https://www.wikidata.org/wiki/Wikidata:Property_proposal/number_of_submissions.

  23. 23.

    https://www.wikidata.org/wiki/Wikidata:Property_proposal/number_of_accepted_contributions.

  24. 24.

    https://w.wiki/9mWB.

  25. 25.

    https://w.wiki/9nnJ.

  26. 26.

    https://github.com/scholarly-wikidata/scholarly-wikidata/wiki/Scholary-Wikidata-Query-Examples.

  27. 27.

    https://github.com/scholarly-wikidata/scholarly-wikidata/blob/fa6bbdc78f69df81ae12b45d8537e1977eee8aa6/docs/EKAW_2024_Paper_Appendix.pdf.

References

  1. Angioni, S., Salatino, A., Osborne, F., Recupero, D.R., Motta, E.: The AIDA dashboard: a web application for assessing and comparing scientific conferences. IEEE Access 1 (2022). https://doi.org/10.1109/ACCESS.2022.3166256

  2. Angioni, S., et al.: Leveraging knowledge graph technologies to assess journals and conferences at springer nature. In: Sattler, U., et al. (eds.) The Semantic Web – ISWC 2022. LNCS, vol. 13489, pp. 735–752. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19433-7_42

  3. Diefenbach, D., Wilde, M.D., Alipio, S.: Wikibase as an infrastructure for knowledge graphs: the EU knowledge graph. In: Hotho, A., et al. (eds.) ISWC 2021. LNCS, vol. 12922, pp. 631–647. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88361-4_37

    Chapter  Google Scholar 

  4. Erxleben, F., Günther, M., Krötzsch, M., Mendez, J., Vrandečić, D.: Introducing Wikidata to the linked data web. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 50–65. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_4

    Chapter  Google Scholar 

  5. Fahl, W., Holzheim, T., Lange, C., Decker, S.: Semantification of CEUR-WS with Wikidata as a target Knowledge Graph. In: Joint Proceedings of TEXT2KG 2023 and BiKE 2023 (2023). https://ceur-ws.org/Vol-3447/Text2KG_Paper_13.pdf

  6. Färber, M.: The Microsoft academic knowledge graph: a linked data source with 8 billion triples of scholarly data. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 113–129. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_8

    Chapter  Google Scholar 

  7. Fathalla, S., Lange, C., Auer, S.: EVENTSKG: a 5-star dataset of top-ranked events in eight computer science communities. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 427–442. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_28

    Chapter  Google Scholar 

  8. Fathalla, S., Vahdati, S., Lange, C., Auer, S.: SEO: a scientific events data model. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 79–95. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_6

    Chapter  Google Scholar 

  9. Haase, P., et al.: Bibster – a semantics-based bibliographic peer-to-peer system. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 122–136. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30475-3_10

    Chapter  Google Scholar 

  10. Heindorf, S., Potthast, M., Stein, B., Engels, G.: Vandalism detection in Wikidata. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016, pp. 327–336. ACM (2016)

    Google Scholar 

  11. Khorashadizadeh, H., Mihindukulasooriya, N., Tiwari, S., Groppe, J., Groppe, S.: Exploring in-context learning capabilities of foundation models for generating knowledge graphs from text. In: Joint Proceedings of the Second International Workshop on Knowledge Graph Generation From Text and the First International BiKE Challenge co-located with 20th Extended Semantic Conference (ESWC 2023), Hersonissos, Greece, May 29th, 2023. CEUR Workshop Proceedings, vol. 3447, pp. 132–153. CEUR-WS.org (2023)

    Google Scholar 

  12. Kirrane, S., et al.: A decade of semantic web research through the lenses of a mixed methods approach. Semantic Web 11(6), 979–1005 (2020). https://doi.org/10.3233/SW-200371

    Article  Google Scholar 

  13. Kruger, A., et al.: Deadliner: building a new niche search engine. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 272–281 (2000). https://doi.org/10.1145/354756.354829

  14. Mihindukulasooriya, N.: Dblp to wikidata: Populating scholarly articles in Wikidata. In: Proceedings of the ISWC 2024 Posters, Demos and Industry Tracks co-located with the 23rd International Semantic Web Conference (ISWC2024) (2024)

    Google Scholar 

  15. Mihindukulasooriya, N., Tiwari, S., Enguix, C.F., Lata, K.: Text2kgbench: a benchmark for ontology-driven knowledge graph generation from text. In: Payne, T.R., et al. (eds.) The Semantic Web - ISWC 2023, Part II. LNCS, vol. 14266, pp. 247–265. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-47243-5_14

  16. Min, B., et al.: Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 56(2), 1–40 (2023)

    Article  Google Scholar 

  17. Nielsen, F.Å.: Synia: aisplaying data from Wikibases. In: Wiki Workshop 2023 proceedings (2023). https://doi.org/10.48550/ARXIV.2303.15133

  18. Nielsen, F.Å., Mietchen, D., Willighagen, E.: Scholia and scientometrics with wikidata. In: Scientometrics 2017, pp. 237–259 (2017)

    Google Scholar 

  19. Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Conference linked data: the scholarlydata project. In: Groth, P., et al. (eds.) ISWC 2016, Part II. LNCS, vol. 9982, pp. 150–158. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_16

    Chapter  Google Scholar 

  20. Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Generating conference linked open data in one click. In: ISWC (Posters & Demos) (2016)

    Google Scholar 

  21. Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Semantic web conference ontology - a refactoring solution. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 84–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_18

    Chapter  Google Scholar 

  22. Pellissier Tanon, T., Vrandečić, D., Schaffert, S., Steiner, T., Pintscher, L.: From freebase to Wikidata: the great migration. In: Proceedings of the 25th International Conference on World Wide Web, pp. 1419–1428. WWW ’16, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE (2016). https://doi.org/10.1145/2872427.2874809

  23. Rossenova, L., Duchesne, P., Blümel, I.: Wikidata and wikibase as complementary research data management services for cultural heritage data. In: Wikidata 2022: Wikidata Workshop 2022, Proceedings of the 3rd Wikidata Workshop 2022 co-located with the 21st International Semantic Web Conference (ISWC2022) (2022)

    Google Scholar 

  24. Taraborelli, D., Dugan, J.M., Pintscher, L., Mietchen, D., Neylon, C.: Wikicite 2016 Report. Technical report, Wikimedia Foundation (2016). https://doi.org/10.6084/M9.FIGSHARE.4042530

  25. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78-85 (2014). https://doi.org/10.1145/2629489

  26. Weng, Y., et al.: Large language models are better reasoners with self-verification. In: Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pp. 2550–2575. Association for Computational Linguistics (2023). https://doi.org/10.18653/V1/2023.FINDINGS-EMNLP.167

  27. Xu, B., Li, D.: An empirical study of the motivations for content contribution and community participation in Wikipedia. Inf. Manag.t 52(3), 275–286 (2015). https://doi.org/10.1016/j.im.2014.12.003

    Article  Google Scholar 

Download references

Acknowledgements

This research was funded in whole or in part by the Austrian Science Fund (FWF) [10.55776/COE12].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nandana Mihindukulasooriya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mihindukulasooriya, N., Tiwari, S., Dobriy, D., Nielsen, F.Å., Chhetri, T.R., Polleres, A. (2025). Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata Using LLMs. In: Alam, M., Rospocher, M., van Erp, M., Hollink, L., Gesese, G.A. (eds) Knowledge Engineering and Knowledge Management. EKAW 2024. Lecture Notes in Computer Science(), vol 15370. Springer, Cham. https://doi.org/10.1007/978-3-031-77792-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-77792-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-77791-2

  • Online ISBN: 978-3-031-77792-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics