Cluster Robust Inference for Embedding-Based Knowledge Graph Completion

Schramm, Simon; Niklas, Ulrich; Schmid, Ute

doi:10.1007/978-3-031-40283-8_25

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14117))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

530 Accesses

Abstract

Knowledge Graphs (KGs) are able to structure and represent knowledge in complex systems, whereby their completeness impacts the quality of any further application. Real world KGs are notoriously incomplete, which is why KG Completion (KGC) methods emerged. Most KGC methods rely on Knowledge Graph Embedding (KGE) based link prediction models, which provide completion candidates for a sparse KG. Metrics like the Mean Rank, the Mean Reciprocal Rank or Hits@K evaluate the quality of those models, like TransR, DistMult or ComplEx. Based on the principle of supervised learning, these metrics evaluate a KGC model trained on a training dataset, based on the partition of true completion candidates it achieves on a test dataset. Dealing with real world, complex KGs, we found that sparsity is not equally distributed across a KG, but rather grouped in clusters. We use modularity-based KG clustering, to approximate sparsity levels in a KG. Furthermore, we postulate that prediction errors of an embedding-based KGC model are correlated within clusters of a KG but uncorrelated between them and formalize a new, cluster-robust KGC evaluation metric. We test our metric using six benchmark dataset and one real-world industrial example dataset. Our experiments show its superiority to existing metrics with regards to the prediction of cluster-robust triplets\(^{1}\)(\(^{1}\)The code is available at https://github.com/simoncharmms/crmrr.).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In general, those graphs are referred to as Label-Property-Graphs and are opposed to graphs where also a pre-defined scheme for rules over entities, referred to as an ontology, is underlying [25]. In this paper, we focus on Label-Property-Graphs.

References

Akrami, F., Saeef, M.S., Zhang, Q., Hu, W., Li, C.: Realistic re-evaluation of knowledge graph completion methods: an experimental study (2020). https://doi.org/10.1145/3318464.3380599
Ali, M., et al.: Bringing light into the dark: a large-scale evaluation of knowledge graph embedding models under a unified framework. CoRR (2020). https://doi.org/10.1109/TPAMI.2021.3124805
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2007). https://doi.org/10.1117/1.2819119
Biswas, R.: Embedding based link prediction for knowledge graph completion. In: Proceedings of the 29th ACM International Conference on Information and Knowledge Management, pp. 3221–3224. ACM, Virtual Event, Ireland (2020). https://doi.org/10.1145/3340531.3418512
Bordes, A., Usunier, N., Garcia-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2, pp. 2787–2795. Curran Associates Inc., Red Hook, NY, USA (2013). https://doi.org/10.5555/2999792.2999923
Chen, Z., Wang, Y., Zhao, B., Cheng, J., Zhao, X., Duan, Z.: Knowledge graph completion: a review. IEEE Access 8, 192435–192456 (2020). https://doi.org/10.1109/ACCESS.2020.3030076
Article Google Scholar
Colin Cameron, A., Miller, D.L.: A practitioner’s guide to cluster-robust inference. J. Hum. Resourc. 50(2), 317–372 (2015). https://doi.org/10.3368/jhr.50.2.317
Article Google Scholar
Esarey, J., Menger, A.: Practical and effective approaches to dealing with clustered data. Polit. Sci. Res. Methods 7(3), 541–559 (2019). https://doi.org/10.1017/psrm.2017.42
Article Google Scholar
Fagiolo, G.: Clustering in complex directed networks. Phys. Rev. E 76, 026107 (2007). https://doi.org/10.1103/PhysRevE.76.026107
Article Google Scholar
Fayyad, U., Smyth, P., Piatetsky-Shapiro, G.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37–54 (1996). https://doi.org/10.1609/aimag.v17i3.1230
Article Google Scholar
Fuhr, N.: Some common mistakes in IR evaluation, and how they can be avoided. ACM SIGIR Forum 51(3), 32–41 (2018). https://doi.org/10.1145/3190580.3190586
Article Google Scholar
Hoyt, C.T., Berrendorf, M., Galkin, M., Tresp, V., Gyori, B.M.: A unified framework for rank-based evaluation metrics for link prediction in knowledge graphs. Pre-print (2022). https://doi.org/10.48550/ARXIV.2203.07544
Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., Subrahmanian, V.: Rev2: fraudulent user prediction in rating platforms. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 333–341. ACM (2018). https://doi.org/10.1145/3159652.3159729
Kumar, S., Spezzano, F., Subrahmanian, V., Faloutsos, C.: Edge weight prediction in weighted signed networks. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 221–230. IEEE (2016). https://doi.org/10.1109/ICDM.2016.0033
Lee, J.R., Gharan, S.O., Trevisan, L.: Multiway spectral partitioning and higher-order Cheeger inequalities. J. ACM 61(6), 1–30 (2014). https://doi.org/10.1145/2665063
Article MathSciNet MATH Google Scholar
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters (2008). https://doi.org/10.1080/15427951.2009.10129177
Leung, M.P.: Network cluster-robust inference. Pre-print (2021). https://doi.org/10.48550/ARXIV.2103.01470
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, no. 1 (2015). https://doi.org/10.1609/aaai.v29i1.9491
MacKinnon, J.G.: How cluster-robust inference is changing applied econometrics. Can. J. Econ./Revue canadienne d’économique 52(3), 851–881 (2019). https://doi.org/10.1111/caje.12388
Article Google Scholar
MacKinnon, J.G., Nielsen, M.Ø., Webb, M.D.: Cluster-robust inference: a guide to empirical practice. J. Econometrics, S0304407622000781 (2022). https://doi.org/10.1016/j.jeconom.2022.04.001
Marina Speranskaya, M.S.: Ranking vs. classifying: measuring knowledge base completion quality. Pre-print (2020). https://doi.org/10.24432/C57G65
McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM Conference on Recommender Systems. RecSys ’13, pp. 165–172. Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2507157.2507163
Mirkin, B.G.: Clustering for Data Mining: A Data Recovery Approach. No. 3 in Computer Science and Data Analysis Series, Chapman & Hall/CRC, Boca Raton, FL (2005). https://doi.org/10.1201/9781420034912
Paranjape, A., Benson, A.R., Leskovec, J.: Motifs in temporal networks. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM (2017). https://doi.org/10.1145/3018661.3018731
Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2016). https://doi.org/10.3233/SW-160218
Article Google Scholar
Rosa, G.J.M.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Hastie, T., Tibshirani, R., Friedman, J. Biometrics 66(4), 1315–1315 (2010). https://doi.org/10.1111/j.1541-0420.2010.01516.x
Rozemberczki, B., Allen, C., Sarkar, R.: Multi-scale attributed node embedding (2019). https://doi.org/10.1093/comnet/cnab014
Traag, V.A., Waltman, L., van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9(1), 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z
Article Google Scholar
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. Pre-print (2016). https://doi.org/10.48550/ARXIV.1606.06357
van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2019). https://doi.org/10.1007/s10994-019-05855-6
Article MathSciNet MATH Google Scholar
White, H.: A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4), 817 (1980). https://doi.org/10.2307/1912934
Article MathSciNet MATH Google Scholar
Yang, B., Yih, W.T., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. Pre-print (2014). https://doi.org/10.48550/ARXIV.1412.6575
Yang, H., Lin, Z., Zhang, M.: Rethinking knowledge graph evaluation under the open-world assumption (2022). https://doi.org/10.48550/ARXIV.2209.08858

Download references

Author information

Authors and Affiliations

University of Bamberg, Kapuzinerstraße 16, 96047, Bamberg, Germany
Simon Schramm & Ute Schmid
BMW Group, Knorrstraße 147, 80807, Munich, Germany
Simon Schramm & Ulrich Niklas

Authors

Simon Schramm
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Niklas
View author publications
You can also search for this author in PubMed Google Scholar
Ute Schmid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Schramm .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhi Jin
South China Normal University, Guangzhou, China
Yuncheng Jiang
Babeș-Bolyai University, Cluj-Napoca, Romania
Robert Andrei Buchmann
Ulster University, Belfast, UK
Yaxin Bi
Babeș-Bolyai University, Cluj-Napoca, Romania
Ana-Maria Ghiran
South China Normal University, Guangzhou, China
Wenjun Ma

Appendix

1.1 Appendix A: Relevant Notations of this Paper

Notation	Meaning
G	A graph
\( \lbrace (h, r, t) \rbrace \in \mathcal {T}\)	A set of triples with heads, relations and tails
\(e \in \mathcal {E}\)	A set of entities e
\(r \in \mathcal {R}\)	A set of relations r
\(R_r \in \mathbb {R}^{d \times d}\)	A \(d \times d\) dimensional matrix with coefficients \(a_{ji}\)
\(r_r \in \mathbb {R}^{d}\)	A d-dimensional vector
\(k^{out}=\sum _j a_{ji}\)	The outbound degree
\(k_i^{in}=\sum _j a_{ij}\)	The inbound degree
Deg	The total degree of a graph
T(e)	The number of triangles through entity e
Trans	The transitivity of a graph
CCO	The mean clustering coefficient of a graph
\(r_{cc}\)	The rank of a cross-cluster completion candidate
n	The total count of ranks \(r_{i}\)
m	The count of cross-cluster ranks \(r_{j \vert cc}\) from \( \lbrace (h,r,t)_{cc} \rbrace \)
\(\|c \|\)	The count of clusters of a clustered KG
\((h,r,t)_{cc}\)	A cross-cluster completion candidate

1.2 Appendix B: Metrics of Used Knowledge Graphs

	Industry KG	\(BTC_{alpha}\)	\(BTC_{otc}\)	\(web-google\)
	Industry KG	[14]	[13]	[16]
Node count	116699	24185	35591	5105039
Relation count	5710	3782	5881	875713
Deg	40.875	12.789	12.103	11.659
Trans	0.012	0.063	0.045	0.449
CCO	0.044	0.158	0.151	0.369
\(\|c \|\)	61	79	87	29722
	\(sx-stackoverflow\)	\(web-amazon\)	\(web-facebook\)
	[24]	[22]	[27]
Node count	574795	925872	171002
Relation count	406972	334863	22470
Deg	2.824	5.529	15.220
Trans	0.001	0.103	0.124
CCO	0.001	0.198	0.179
\(\|c \|\)	90266	31916	790

1.3 Appendix C: Training Details and Results by KG and Model.

	Model	Epochs	Batch size	m/n	MR	MRR	\(H_{10}\)	CRMRR
\(web-google\)	TransR			0.381	6.070	0.142	0.887	0.221
	DistMult	510	32	0.137	49.370	0.024	0.362	0.389
	ComplEx			0.152	315.440	0.001	0.135	0.229
\(web-amazon\)	TransR			0.212	11.300	0.093	0.362	0.116
	DistMult	510	32	0.365	29.620	0.012	0.566	0.217
	ComplEx			0.195	508.480	0.009	0.112	0.368
\(sx-stackoverflow\)	TransR			0.262	7.150	0.074	0.791	0.145
	DistMult	510	32	0.165	38.400	0.025	0.498	0.285
	ComplEx			0.265	484.940	0.002	0.255	0.115
\(web-facebook\)	TransR			0.185	3.660	0.156	0.272	0.083
	DistMult	510	32	0.114	26.880	0.008	0.798	0.127
	ComplEx			0.098	795.670	0.001	0.302	0.136
Industry KG	TransR			0.134	8.312	0.124	0.727	0.189
	DistMult	510	32	0.119	54.854	0.018	0.578	0.283
	ComplEx			0.219	470.813	0.002	0.184	0.171
\(BTC_{alpha}\)	TransR			0.530	7.400	0.056	0.210	0.082
	DistMult	510	32	0.262	20.840	0.011	0.438	0.180
	ComplEx			0.235	404.900	0.004	0.076	0.119
\(BTC_{otc}\)	TransR			0.154	3.820	0.113	0.626	0.155
	DistMult	510	32	0.170	21.390	0.012	0.394	0.202
	ComplEx			0.32	348.40	0.001	0.134	0.081

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schramm, S., Niklas, U., Schmid, U. (2023). Cluster Robust Inference for Embedding-Based Knowledge Graph Completion. In: Jin, Z., Jiang, Y., Buchmann, R.A., Bi, Y., Ghiran, AM., Ma, W. (eds) Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science(), vol 14117. Springer, Cham. https://doi.org/10.1007/978-3-031-40283-8_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-40283-8_25
Published: 09 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40282-1
Online ISBN: 978-3-031-40283-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cluster Robust Inference for Embedding-Based Knowledge Graph Completion

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Appendix A: Relevant Notations of this Paper

1.2 Appendix B: Metrics of Used Knowledge Graphs

1.3 Appendix C: Training Details and Results by KG and Model.

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation