Abstract
Several initiatives have been undertaken to conceptually model the domain of scholarly data using ontologies and to create respective Knowledge Graphs. Yet, the full potential seems unleashed, as automated means for automatic population of said ontologies are lacking, and respective initiatives from the Semantic Web community are not necessarily connected: we propose to make scholarly data more sustainably accessible by leveraging Wikidata’s infrastructure and automating its population in a sustainable manner through LLMs by tapping into unstructured sources like conference Web sites and proceedings texts as well as already existing structured conference datasets. While an initial analysis shows that Semantic Web conferences are only minimally represented in Wikidata, we argue that our methodology can help to populate, evolve and maintain scholarly data as a community within Wikidata.
Our main contributions include (a) an analysis of ontologies for representing scholarly data to identify gaps and relevant entities/properties in Wikidata, (b) semi-automated extraction – requiring (minimal) manual validation – of conference metadata (e.g., acceptance rates, organizer roles, programme committee members, best paper awards, keynotes, and sponsors) from websites and proceedings texts using LLMs. Finally, we discuss (c) extensions to visualization tools in the Wikidata context for data exploration of the generated scholarly data. Our study focuses on data from 105 Semantic Web-related conferences and extends/adds more than 6000 entities in Wikidata. It is important to note that the method can be more generally applicable beyond Semantic Web-related conferences for enhancing Wikidata’s utility as a comprehensive scholarly resource. Source Repository: https://github.com/scholarly-wikidata/
DOI: https://doi.org/10.5281/zenodo.10989709
License: Creative Commons CC0 (Data), MIT (Code).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
link provided at the end of the paper.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
References
Angioni, S., Salatino, A., Osborne, F., Recupero, D.R., Motta, E.: The AIDA dashboard: a web application for assessing and comparing scientific conferences. IEEE Access 1 (2022). https://doi.org/10.1109/ACCESS.2022.3166256
Angioni, S., et al.: Leveraging knowledge graph technologies to assess journals and conferences at springer nature. In: Sattler, U., et al. (eds.) The Semantic Web – ISWC 2022. LNCS, vol. 13489, pp. 735–752. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19433-7_42
Diefenbach, D., Wilde, M.D., Alipio, S.: Wikibase as an infrastructure for knowledge graphs: the EU knowledge graph. In: Hotho, A., et al. (eds.) ISWC 2021. LNCS, vol. 12922, pp. 631–647. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88361-4_37
Erxleben, F., Günther, M., Krötzsch, M., Mendez, J., Vrandečić, D.: Introducing Wikidata to the linked data web. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 50–65. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_4
Fahl, W., Holzheim, T., Lange, C., Decker, S.: Semantification of CEUR-WS with Wikidata as a target Knowledge Graph. In: Joint Proceedings of TEXT2KG 2023 and BiKE 2023 (2023). https://ceur-ws.org/Vol-3447/Text2KG_Paper_13.pdf
Färber, M.: The Microsoft academic knowledge graph: a linked data source with 8 billion triples of scholarly data. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 113–129. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_8
Fathalla, S., Lange, C., Auer, S.: EVENTSKG: a 5-star dataset of top-ranked events in eight computer science communities. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 427–442. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_28
Fathalla, S., Vahdati, S., Lange, C., Auer, S.: SEO: a scientific events data model. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 79–95. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_6
Haase, P., et al.: Bibster – a semantics-based bibliographic peer-to-peer system. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 122–136. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30475-3_10
Heindorf, S., Potthast, M., Stein, B., Engels, G.: Vandalism detection in Wikidata. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016, pp. 327–336. ACM (2016)
Khorashadizadeh, H., Mihindukulasooriya, N., Tiwari, S., Groppe, J., Groppe, S.: Exploring in-context learning capabilities of foundation models for generating knowledge graphs from text. In: Joint Proceedings of the Second International Workshop on Knowledge Graph Generation From Text and the First International BiKE Challenge co-located with 20th Extended Semantic Conference (ESWC 2023), Hersonissos, Greece, May 29th, 2023. CEUR Workshop Proceedings, vol. 3447, pp. 132–153. CEUR-WS.org (2023)
Kirrane, S., et al.: A decade of semantic web research through the lenses of a mixed methods approach. Semantic Web 11(6), 979–1005 (2020). https://doi.org/10.3233/SW-200371
Kruger, A., et al.: Deadliner: building a new niche search engine. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 272–281 (2000). https://doi.org/10.1145/354756.354829
Mihindukulasooriya, N.: Dblp to wikidata: Populating scholarly articles in Wikidata. In: Proceedings of the ISWC 2024 Posters, Demos and Industry Tracks co-located with the 23rd International Semantic Web Conference (ISWC2024) (2024)
Mihindukulasooriya, N., Tiwari, S., Enguix, C.F., Lata, K.: Text2kgbench: a benchmark for ontology-driven knowledge graph generation from text. In: Payne, T.R., et al. (eds.) The Semantic Web - ISWC 2023, Part II. LNCS, vol. 14266, pp. 247–265. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-47243-5_14
Min, B., et al.: Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 56(2), 1–40 (2023)
Nielsen, F.Å.: Synia: aisplaying data from Wikibases. In: Wiki Workshop 2023 proceedings (2023). https://doi.org/10.48550/ARXIV.2303.15133
Nielsen, F.Å., Mietchen, D., Willighagen, E.: Scholia and scientometrics with wikidata. In: Scientometrics 2017, pp. 237–259 (2017)
Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Conference linked data: the scholarlydata project. In: Groth, P., et al. (eds.) ISWC 2016, Part II. LNCS, vol. 9982, pp. 150–158. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_16
Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Generating conference linked open data in one click. In: ISWC (Posters & Demos) (2016)
Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Semantic web conference ontology - a refactoring solution. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 84–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_18
Pellissier Tanon, T., Vrandečić, D., Schaffert, S., Steiner, T., Pintscher, L.: From freebase to Wikidata: the great migration. In: Proceedings of the 25th International Conference on World Wide Web, pp. 1419–1428. WWW ’16, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE (2016). https://doi.org/10.1145/2872427.2874809
Rossenova, L., Duchesne, P., Blümel, I.: Wikidata and wikibase as complementary research data management services for cultural heritage data. In: Wikidata 2022: Wikidata Workshop 2022, Proceedings of the 3rd Wikidata Workshop 2022 co-located with the 21st International Semantic Web Conference (ISWC2022) (2022)
Taraborelli, D., Dugan, J.M., Pintscher, L., Mietchen, D., Neylon, C.: Wikicite 2016 Report. Technical report, Wikimedia Foundation (2016). https://doi.org/10.6084/M9.FIGSHARE.4042530
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78-85 (2014). https://doi.org/10.1145/2629489
Weng, Y., et al.: Large language models are better reasoners with self-verification. In: Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pp. 2550–2575. Association for Computational Linguistics (2023). https://doi.org/10.18653/V1/2023.FINDINGS-EMNLP.167
Xu, B., Li, D.: An empirical study of the motivations for content contribution and community participation in Wikipedia. Inf. Manag.t 52(3), 275–286 (2015). https://doi.org/10.1016/j.im.2014.12.003
Acknowledgements
This research was funded in whole or in part by the Austrian Science Fund (FWF) [10.55776/COE12].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mihindukulasooriya, N., Tiwari, S., Dobriy, D., Nielsen, F.Å., Chhetri, T.R., Polleres, A. (2025). Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata Using LLMs. In: Alam, M., Rospocher, M., van Erp, M., Hollink, L., Gesese, G.A. (eds) Knowledge Engineering and Knowledge Management. EKAW 2024. Lecture Notes in Computer Science(), vol 15370. Springer, Cham. https://doi.org/10.1007/978-3-031-77792-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-77792-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77791-2
Online ISBN: 978-3-031-77792-9
eBook Packages: Computer ScienceComputer Science (R0)