本文已被:浏览 3885次 下载 1580次
Received:March 18, 2020
Received:March 18, 2020
中文摘要: 针对K-means异常检测算法检测性能低的问题,提出了一种结合信息熵与改进K-means算法的异常检测算法。该算法均匀地选出密度大于数据集平均密度的数据对象作为初始聚类中心,避免了初始中心的随机选择。在此基础上,引入了信息熵确定属性权重的方法来计算簇中数据点与该簇聚类中心的加权欧氏距离,通过对比簇中数据点的加权欧氏距离与该簇中所有数据点的平均加权欧氏距离来进行异常检测。实验表明,改进算法具有更高的检测率和更低的误检率,应用于电力负荷数据时检测率达到了90.5%,能够有效地检测出异常的负荷数据。
Abstract:To solve the problem of low detection performance of the K-means anomaly detection method,an anomaly detection algorithm combining information entropy and improved K-means is proposed.The algorithm uniformly chooses the data object whose density is greater than the average density of the data set as the initial clustering center,avoiding the random selection of the initial center.Besides,the weighted Euclidean distance between the data point and the cluster center in the cluster is calculated according to the attribute weight based on the information entropy.Anomaly detection is performed by comparing the weighted Euclidean distance of the data point with the average weighted Euclidean distance of all data points in the cluster.Experiments show that the improved algorithm has higher detection rate and lower false detection rate.When the algorithm is applied to power load data,the detection rate reaches 90.5%.The abnormal power load data can be effectively detected.
keywords: information entropy K-means anomaly detection
文章编号:20204012 中图分类号:TP312 文献标志码:
基金项目:国家自然科学基金(41672114)。
Reference text: