本文已被:浏览 1327次 下载 564次
中文摘要: 为了从海量文本中高效提取知识,提出了一种基于上下文关系和TextRank算法的中文文本关键词提取方法.首先使用传统方法提取出初始关键词,然后利用互信息筛选出在上下文中对关键词依赖程度大的词,将其作为候选关键词,最后利用TextRank算法计算出最能表达文本主题思想的特征关键词.实验结果表明,与传统方法相比,所提算法在查准率、查全率等相关指标上均有提高.
中文关键词: 关键词提取 上下文关系 互信息 TextRank算法
Abstract:A new keyword extraction method based on context and TextRank algorithm is established to extract the knowledge efficiently from the massive texts.Firstly the algorithm uses mutual information to select the words into candidate key words collection depending on the key words in the context.Then it uses TextRank algorithm to select the words that can express the theme of text.The results show that the algorithm has a higher degree of promotion in precision and recall.
文章编号:20176019 中图分类号: 文献标志码:
作者 | 单位 | |
杜海舟 | 上海电力学院 | |
陈政波 | 上海电力学院 | townwave@163.com |
钟孔露 | 浙江华云电力工程设计咨询有限公司 |
DU Haizhou,CHEN Zhengbo,ZHONG Konglu.Keyword Extraction Method Based on Context and TextRank Algorithm[J].Journal of Shanghai University of Electric Power,2017,33(6):607-612.
DU Haizhou,CHEN Zhengbo,ZHONG Konglu.Keyword Extraction Method Based on Context and TextRank Algorithm[J].Journal of Shanghai University of Electric Power,2017,33(6):607-612.