PDF (1444K)
摘要
针对现有数据清洗方法存在清洗不彻底、效率较低的问题,提出一种基于动态融合局部异常因子的时序数据快速清洗方法。设置多个不同的k值分别计算各数据点的局部异常因子,通过归一化处理消除量纲影响,并结合比值加权计算得到综合局部异常因子,采用动态融合策略对数据点异常程度进行综合评估,实现异常数据的精准识别。利用布谷鸟算法对K-means聚类算法的初始聚类中心进行优化,避免传统K-means易陷入局部最优的问题。将异常数据删除后形成缺失数据集,利用优化后的K-means算法对数据进行聚类,找出包含缺失位置的簇,计算该簇内数据的均值作为原始真实值的估算,并以此替代异常数据,完成数据修复与清洗任务。测试结果表明,该方法清洗全面指数更高、清洗数据均方误差更小、清洗时间开销更少,表明该方法能够快速、全面、准确地完成时序数据中的异常数据清洗任务。
Abstract
A research method for rapid cleaning of time-series data based on dynamic fusion of local abnormal factors is proposed to address the problems of incomplete cleaning and low efficiency in existing data cleaning methods. Set multiple different k values to calculate the local anomaly factors of each data point, eliminate the influence of dimensionality through normalization, and combine ratio weighting to calculate the comprehensive local anomaly factors. Use dynamic fusion strategy to comprehensively evaluate the degree of anomaly of data points and achieve accurate identification of anomalous data. Optimize the initial cluster centers of K-means clustering algorithm using cuckoo algorithm to avoid the problem of traditional K-means falling into local optima. After deleting abnormal data, a missing dataset is formed. The optimized K-means algorithm is used to cluster the data, identify clusters containing missing positions, calculate the mean of the data within the cluster as an estimate of the original true value, and use it as a substitute for abnormal data to complete data repair and cleaning tasks. The test results show that this method has a higher comprehensive cleaning index, smaller mean square error in cleaning data, and less cleaning time cost, indicating that the proposed method can quickly, comprehensively, and accurately complete the task of cleaning abnormal data in time-series data.
关键词
Key words
郝福忠, 杨宇方, 姬哲, 张静, 王军义.
一种基于动态融合局部异常因子的时序数据快速清洗方法[J].
自动化技术与应用, 2026, 45(6): 135-139 DOI:10.20033/j.1003-7241.(2026)06-0135-05