Abstract:Nowadays it is difficult to distinguish the subtopics in a news topic.To solve this problem,in the paper,the method based on Latent Dirichlet Allocation and Derived Partition is presented to obtain the subtopics division.Firstly,Latent Dirichlet Allocation is used to extract latent topics,then the θ matrix is exchanged to the full covering model by setting an appropriate threshold.Secondly,on the basis of full covering granular reduction,the redundant topics are deleted in the full covering model of θ matrix.Finally,the intersection and symmetric difference operations are carried out based on the reducted full covering,until no new granules generated.Through the comparison experiments with the three baseline methods and single-pass method in the Sohu News Corpus,the experimental results show that this proposed method can effectively reduce the error identification cost of the news subtopic division.
苏婧琼,刘建霞,谢珺,郝洁,任密蜂. 面向新闻文档的子话题划分方法研究[J]. 小型微型计算机系统, 2017, 38(8): 1850-1855.
SU Jing-qiong,LIU Jian-xia,XIE Jun,HAO Jie,REN Mi-feng. Research of Sub-topic Division Method in News Documents. Journal of Chinese Computer Systems, 2017, 38(8): 1850-1855.