Abstract
Wikidata has become one of the most prominent open knowledge graphs (KGs) on the Web. Relying on a community of users with different expertise, this cross-domain KG is directly related to other data sources. This paper investigates how Wikidata is linked to other data sources in the Linked Data ecosystem. To this end, we adapt previous definitions of ontology links and instance links to the terminological part of the Wikidata vocabulary and perform an analysis of the links in Wikidata to external datasets and ontologies from the Linked Data ecosystem. As a side effect, this reveals insights on the ontological expressiveness of meta-properties used in Wikidata. The results of this analysis show that while Wikidata defines a large number of individuals, classes and properties within its own namespace, they are not (yet) extensively linked. We discuss reasons for this and conclude with some suggestions to increase the interconnectedness of Wikidata with other KGs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
For instance, the pattern {[] wdt:P279 ?X; wdt:P31 ?X.} indicates ambiguous subclass vs. instance of usage on 2131 entities, run on 9 Dec 2021 at https://w.wiki/4XQw.
- 5.
See https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Top-level_ontology_list for the top two layers of the ontology.
- 6.
i.e., there are 37 uses of P2445 in total in Wikidata as of August 2021.
- 7.
There are higher orders of second-order class, i.e., third-, fourth- and fifth-order classes, each of which is an instance of the higher ordered class, all of which are subclasses of the fixed-order class (Q23959932).
- 8.
- 9.
- 10.
We note here again that subtle semantic differences such as constraining (i.e., CWA) vs implicit (i.e., OWA) semantics of certain properties are not relevant for the purpose of our link analysis.
- 11.
Prefixes are used as follows: wd: <http://www.wikidata.org/entity/>, wdt: <http://www.wikidata.org/prop/direct/>, pq: <http://www.wikidata.org/prop/qualifier/>, p: <http://www.wikidata.org/prop/>, ps: <http://www.wikidata.org/prop/statement/>.
- 12.
- 13.
All code implemented in Python is available at: https://github.com/arminhaller/LinksInLOD.
- 14.
- 15.
No longitudional data is published on the Wikidata site, but the growth in the number of properties between July and November 2021 was 3.4%.
- 16.
Some individuals might use more than one exact match relation.
References
Abián, D., Bernad, J., Trillo, R.: Using contemporary constraints to ensure data consistency. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 2303–2310, April 2019
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Balaraman, V., Razniewski, S., Nutt, W.: Recoin: relative completeness in Wikidata. In: Wiki Workshop 2018 co-located with the Web Conference 2018 in Lyon, France, 24 April 2018, April 2018
Beek, W., Rietveld, L., Bazoobandi, H.R., Wielemaker, J., Schlobach, S.: LOD laundromat: a uniform way of publishing other people’s dirty data. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 213–228. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_14
Berners-Lee, T.: Linked Data. W3C Design Issues, July 2006. http://www.w3.org/DesignIssues/LinkedData.html
Brasileiro, F., Almeida, J.P.A., Carvalho, V.A., Guizzardi, G.: Applying a multi-level modeling theory to assess taxonomic hierarchies in Wikidata. In: Proceedings of the 25th International Conference Companion Volume on World Wide Web, pp. 975–980 (2016)
Debattista, J., Auer, S., Lange, C.: Luzzu - a methodology and framework for linked data quality assessment. J. Data Inf. Qual. 8(1), 4:1–4:32 (2016)
Debattista, J., Lange, C., Auer, S., Cortis, D.: Evaluating the quality of the LOD cloud: an empirical investigation. Semant. Web 9(6), 859–901 (2018)
Erxleben, F., Günther, M., Krötzsch, M., Mendez, J., Vrandečić, D.: Introducing Wikidata to the linked data web. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 50–65. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_4
Freire, N., Isaac, A.: Technical usability of Wikidata’s linked data. In: Abramowicz, W., Corchuelo, R. (eds.) BIS 2019. LNBIP, vol. 373, pp. 556–567. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36691-9_47
Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semant. Web 9(1), 77–129 (2018)
Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_13
Haller, A., Fernández, J.D., Kamdar, M.R., Polleres, A.: What are links in linked open data? A characterization and evaluation of links between knowledge graphs on the web. J. Data Inf. Qual. 12(1), 1–34 (2020)
Haller, A., Polleres, A.: Are we better off with just one ontology on the web? Semant. Web 11(1), 87–99 (2020)
Hernández, D., Hogan, A., Krötzsch, M.: Reifying RDF: what works well with Wikidata? In: Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems, vol. 1457, pp. 32–47. CEUR-WS.org (2015)
Pillai, S.G., Soon, L.-K., Haw, S.-C.: Comparing DBpedia, Wikidata, and YAGO for web information retrieval. In: Piuri, V., Balas, V.E., Borah, S., Syed Ahmad, S.S. (eds.) Intelligent and Interactive Computing. LNNS, vol. 67, pp. 525–535. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-6031-2_40
Piscopo, A., Simperl, E.: Who models the world?: collaborative ontology creation and user roles in Wikidata. Proc. ACM Hum.-Comput. Interact. 2(CSCW), 141:1–141:18 (2018)
Piscopo, A., Simperl, E.: What we talk about when we talk about Wikidata quality: a literature survey. In: Proceedings of the 15th International Symposium on Open Collaboration, New York, NY, USA (2019)
Raad, J., Beek, W., van Harmelen, F., Pernelle, N., Saïs, F.: Detecting erroneous identity links on the web using network metrics. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 391–407. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_23
Radulovic, F., Mihindukulasooriya, N., García-Castro, R., Gómez-Pérez, A.: A comprehensive quality model for linked data. Semant. Web 9(1), 3–24 (2018)
Samuel, J.: Towards understanding and improving multilingual collaborative ontology development in Wikidata. In: Proceedings of Wiki Workshop 2018 co-located with the Web Conference 2018, Lyon, France, April 2018
Sarasua, C., Staab, S., Thimm, M.: Methods for intrinsic evaluation of links in the web of data. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 68–84. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_5
Shenoy, K., Ilievski, F., Garijo, D., Schwabe, D., Szekely, P.: A study of the quality of Wikidata. arXiv preprint arXiv:2107.00156 (2021)
Vandenbussche, P., Atemezing, G., Poveda-Villalón, M., Vatant, B.: Linked open vocabularies (LOV): a gateway to reusable semantic vocabularies on the web. Semant. Web 8(3), 437–452 (2017)
Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)
Acknowledgment
This research has received funding from the Teaming.AI project, which is part of the European Union’s Horizon 2020 research and innovation program under grant agreement No 957402.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Haller, A., Polleres, A., Dobriy, D., Ferranti, N., Rodríguez Méndez, S.J. (2022). An Analysis of Links in Wikidata. In: Groth, P., et al. The Semantic Web. ESWC 2022. Lecture Notes in Computer Science, vol 13261. Springer, Cham. https://doi.org/10.1007/978-3-031-06981-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-06981-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06980-2
Online ISBN: 978-3-031-06981-9
eBook Packages: Computer ScienceComputer Science (R0)