Method of Extractive Summarization Chinese Long Documents
WANG Zong-hui1,2,LI Bao-an1,2,LV Xue-qiang1,YOU Xin-dong1
1(Beijing key Laboratory of Internet Culture & Digital Dissemination Research,Beijing Information Science & Technology University,Beijing 100101,China)2(School of Computer,Beijing Information Science & Technology University,Beijing 100101,China)
Abstract:Text summarization is one of the most important researches in the field of natural language processing and has become a research hotspot with the rise of deep learning,however,extractive summarization of Chinese long text faces greater challenges,it has some problems,such as insufficient long document summary corpus,inaccurate extraction information,redundant target summary and missing summary sentences.Extractive summarization of Chinese long text is the research object in this paper,we propose a BETES method,To construct Chinese long text-abstract corpus based on rules and manual assisted filtering;The Bert preprocessing model is used for text vectorization to better capture the semantics of long text context and improve the accuracy of information extraction;On the basis of recognizing the Elementary Discourse Units(Edu)of Chinese long text,Taking the Edu as the extraction object,reduce the redundancy of summarization;Finally,Transformer neural network extraction model is used to realize the extraction of Edus and improve the accuracy of summarization sentence extraction.Experiments show that the proposed BETES method can improve the accuracy and reduce the redundancy in the process of extracting long Chinese text,and the ROUGE score is superior to the mainstream summarization extraction method.
王宗辉,李宝安,吕学强,游新冬. BETES:一种中文长文档抽取式摘要方法[J]. 小型微型计算机系统, 2022, 43(1): 42-49.
WANG Zong-hui,LI Bao-an,LV Xue-qiang,YOU Xin-dong. Method of Extractive Summarization Chinese Long Documents. Journal of Chinese Computer Systems, 2022, 43(1): 42-49.