(Department of Cyber Space Security,Information and Engineering University,Zhengzhou 450000,China)
(State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450000,China)
Abstract:Semantic similarity calculation plays a very important role in the area of natural language processing.In recent years,with the development of Deep Learning,the technology that using the word embedding to compute the semantic similarity has been widely used.At the same time,a lot of models that computing word embedding have been proposed,and these models correspond one word to a single word embedding.But there are many polysemous words in natural language processing,so these models cannot capture the characteristics of those words properly.We propose a polysemous word embedding calculation model that combines topic model and normal word embedding calculation model.First,we use topic model to do semantic annotation on the corpus,then we regard the annotation words as a new word and proceed normal word embedding calculation method on the corpus,finally we get multi word embedding for a polysemous word.We conduct our experiment on both Chinese and English corpus,the results of our experiment show that our model can get multi word embedding for polysemous words and the semantic similarity calculation accuracy has been improved significantly.