The inductive link prediction model of knowledge graphs requires both positive and negative triplets during the training process. However, the current random negative sampling method tends to produce low-quality negative triplets, which affects the feature learning ability of the model. To address this problem, a similarity-based negative sampling method is proposed. Firstly, the set of k-hop neighbor nodes around the replaced entity is obtained. Secondly, the entities with high similarity are selected from the set to replace the head or tail entities in the original triplet, so as to generate a negative triplet. Finally, the negative sampling method is used in the inductive link prediction model, and the inductive link prediction experiments are carried out on the WN18RR and FB15K-237 datasets. Experimental results demonstrate that compared with other models, the MRR metric is increased by up to 10.47 percentage point, and the Hits@10 metric is increased by up to 16.02 percentage point. Furthermore, the negative sample quality analysis illustrates that high-quality negative triplets are generated by using the negative sampling method, which improves the performance of the model.
该方法由两部分组成,分别是候选集合生成和相似度计算。候选集合是指从实体集合E中选择出可以构成负例三元组的e,构成候选集合Ec。以生成三元组(h,r,t)对应的负例三元组为例,首先,将KG中三元组按照不同关系进行划分,得到关系r相关的所有三元组集合;其次,构建关系r对应的邻接矩阵 Ar,其中Ar [i][j]表示实体i和实体j之间存在关系r;最后,对于给定三元组(h,r,t),从邻接矩阵 Ar 中获取与头实体h或者尾实体t相关的k跳邻居集合 M =+。
KG中包含大量实体,因此生成的k跳邻居集合 M 中实体数量也很多。为了进一步挑选出高质量的实体进行替换,将 M 中实体与原正例三元组中要替换实体进行相似度计算,并按照相似度得分进行降序排列,从中选取前m个实体进行替换形成高质量负例三元组。相似度采用余弦相似度,可表示为
JIS X, PANS R, CAMBRIAE, et al. A survey on knowledge graphs: representation, acquisition, and applications[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021,33(2):494-514.
DETTMERST, MINERVINIP, STENETORPP, et al. Convolutional 2D knowledge graph embeddings[C]∥Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press, 2018:1811-1818.
[6]
TOUTANOVAK, CHEND Q. Observed versus latent features for knowledge base and text inference[C]∥Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality. Stroudsburg, USA: ACL, 2015:57-66.
[7]
GENGY X, CHENJ Y, PANJ Z, et al. Relational message passing for fully inductive knowledge graph completion[C]∥Proceedings of the IEEE International Conference on Data Engineering. Piscataway, USA: IEEE, 2023:1221-1233.
[8]
HAMAGUCHIT, OIWAH, SHIMBOM, et al. Knowledge transfer for out-of-knowledge-base entities: a graph neural network approach[C]∥Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. Washington, USA: AAAI Press, 2017:1802-1808.
[9]
WANGP F, HANJ L, LIC L . et al . Logic attention based neighborhood aggregation for inductive knowledge graph embedding[C]∥Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press, 2019:7152-7159.
[10]
TERUK, DENISE G, HAMILTONW L. Inductive relation prediction by subgraph reasoning[C]∥Proceedings of the 37th International Conference on Machine Learning. Red Hook, USA: Curran Associates Inc., 2020:9448-9457.
[11]
CHENJ J, HEH, WUF, et al. Topology-aware correlations between relations for inductive link prediction in knowledge graphs[C]∥Proceedings of the AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press, 2021:6271-6278.
[12]
MAIS J, ZHENGS J, YANGY D, et al. Communicative message passing for inductive relation reasoning[C]∥Proceedings of the AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press, 2021:4294-4302.
[13]
XUX H, ZHANGP, HEY Q, et al. Subgraph neighboring relations infomax for inductive link prediction on knowledge graphs[DB/OL]. (2022-08-26)[2024-05-13].
[14]
BORDESA, USUNIERN, GARCIADURAN, et al. Translating embeddings for modeling multi-relational data[C]∥Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook, USA: Curran Associates Inc., 2013:2787-2795.