2026, 47(5): 1108-1116.
Aiming at the problem that the existing temporal knowledge graph reasoning has insufficient ability to capture long-distance dependencies and lack of interpretability,a temporal knowledge graph completion model combining Transformer and reinforcement learning is proposed.This model uses reinforcement learning to design a new highly interpretive strategy network,which is composed of three core components:time-aware encoder,path context encoder and action scoring device.First,the time-aware encoder uses the self attention mechanism to embed the time information into the relational representation,which enhances the ability to deal with the time dynamics;Secondly,the path context encoder uses Transformer to efficiently encode historical event sequences,capturing long-distance dependencies;Thirdly,the action scorer uses the two-way gated cycle unit to predict actions,which improves the accuracy of prediction.In addition,for the problem of reward sparsity,the proposed model introduces a new reward function,which comprehensively considers time shaping reward,path length reward and path diversity reward,and provides more detailed feedback to optimize path selection.This paper compares the proposed model with the existing advanced methods on four public datasets,and the results show that the proposed model is effective in evaluating the MRR and Hits@k.Compared with the baseline method,the above method has improved.