本文已被:浏览 1493次 下载 2003次
Received:March 18, 2020
Received:March 18, 2020
中文摘要: 相对于传统的纸媒体,网络媒体中的数据具有更新速度快、用户参与度高、覆盖面广等特点。如何协助用户在较短时间了解网络媒体中的主题信息,是一个亟待研究的领域。目前,文本主题聚类的研究技术还不够成熟,且在国内处于不断研究的阶段,尤其是在中文文本领域。对国内外主题检测研究现状、主题挖掘基本步骤、聚类算法的优缺点等方面进行了系统的概述,指出了当前研究方法的不足以及未来可研究的方向。
Abstract:Compared with traditional print media, the data in network media has the characteristics of fast update speed, high user participation, and wide coverage.How to help users quickly understand the subject information in online media is an area that needs urgent research.At present, research technology is not mature enough in this area, which is at a constantly exploring stage in China.Therefore, a systematic overview of the current status of domestic and international research on topic detection, the basic steps of topic mining, and the advantages and disadvantages of clustering algorithms is provided.The deficiency of current research methods and future research interests are pointed out.
keywords: clustering chinese text topic detection topic mining
文章编号:20210618 中图分类号:TP391 文献标志码:
基金项目:国家自然科学基金(61272437,61305094);上海市教育发展基金会和上海市教育委员会"晨光计划"(13CG58)。
Reference text: