Abstract
There is an increasing trend of using Linked Datasets for creating embeddings from URI sequences, since such embeddings can be exploited for several tasks, i.e., for machine learning problems, tasks related to content-based similarity, and others. Existing techniques exploit either a single or a few datasets (or RDF graphs) for creating URI sequences for one or more entities. However, there are not available approaches, where data from multiple datasets are combined, for enriching the URI sequences for a given entity. For this reason, we introduce a prototype, called LODVec, that exploits LODsyndesis knowledge graph, which is the largest knowledge graph including all inferred equivalence relationships. LODVec exploits this graph for creating URI sequences for millions of entities by combining data from 400 datasets, whereas it offers several configurable options for creating such URI sequences that are based on metadata (e.g., provenance). Moreover, it uses as input the produced URI sequences for creating URI embeddings through word2vec model. We evaluate the gain of exploiting several datasets (instead of a single or few ones) and the impact of cross-dataset reasoning for machine-learning based tasks (i.e., classification and regression), and we compare the effectiveness of several configurations and machine learning models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Antoniou, G., van Harmelen, F.: A Semantic Web Primer, 2nd edn. The MIT Press, Cambridge (2008)
Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Biased graph walks for RDF graph embeddings. In: WIMS, p. 21. ACM (2017)
Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Global RDF vector space embeddings. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 190–207. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_12
Dietze, S., Mohapatra, N., Iosifidis, V., Ekbal, A., Fafalios, P.: Time-aware and corpus-specific entity relatedness, pp. 33–39 (2018)
Hajra, A., Tochtermann, K.: Linking science: approaches for linking scientific publications across different LOD repositories. IJMSO 12(2–3), 124–141 (2017)
Inan, E., Dikenelli, O.: Effect of enriched ontology structures on RDF embedding-based entity linking. In: Garoufallou, E., Virkus, S., Siatri, R., Koutsomiha, D. (eds.) MTSR 2017. CCIS, vol. 755, pp. 15–24. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70863-8_2
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI Conference (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mountantonakis, M., Tzitzikas, Y.: How linked data can aid machine learning-based tasks. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 155–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_13
Mountantonakis, M., Tzitzikas, Y.: High performance methods for linked open data connectivity analytics. Information 9(6), 134 (2018)
Mountantonakis, M., Tzitzikas, Y.: LODsyndesis: global scale knowledge services. Heritage 1(2), 335–348 (2018)
Mountantonakis, M., Tzitzikas, Y.: Large scale semantic integration of linked data: a survey. ACM Comput. Surv. 52, 103 (2019)
Nechaev, Y., Corcoglioniti, F., Giuliano, C.: Type prediction combining linked open data and social media. In: CIKM, pp. 1033–1042. ACM (2018)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of EMNLP Conference, pp. 1532–1543 (2014)
Ristoski, P., Bizer, C., Paulheim, H.: Mining the web of linked data with rapidminer. J. Web Semant. 35, 142–151 (2015)
Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. Semant. Web 10(4), 721–752 (2019)
Ristoski, P., de Vries, G.K.D., Paulheim, H.: A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 186–194. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_20
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: AAAI Conference on Artificial Intelligence (2014)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2016)
Acknowledgements
The research work was supported by the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under the HFRI PhD Fellowship grant (GA. No. 166).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Mountantonakis, M., Tzitzikas, Y. (2019). Knowledge Graph Embeddings over Hundreds of Linked Datasets. In: Garoufallou, E., Fallucchi, F., William De Luca, E. (eds) Metadata and Semantic Research. MTSR 2019. Communications in Computer and Information Science, vol 1057. Springer, Cham. https://doi.org/10.1007/978-3-030-36599-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-36599-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36598-1
Online ISBN: 978-3-030-36599-8
eBook Packages: Computer ScienceComputer Science (R0)