综上,为改进以往离群点检测方法的不足,本文提出了一种基于分类和回归树(Classification and regression trees,CART)决策树的网络大数据集离群点动态检测算法。首先,深入分析数据集中异常数据的规律特征,通过网格划分预处理数据,节省了计算时间,运用CART决策树逐层归类节点属性;然后,利用剪枝操作替换非叶子节点,准确筛选离群点,保证了网络大数据集完整性和可用性;最后,进行仿真实验,从精准度、加速比和扩展性3个方面证明了本文方法的优越性,可为大数据集信息的高效甄别带来一定的参考和借鉴意义。
YangXiao-ling, Fengshan, YuanZhong. Outlier detection based on reversed k-nearest neighborhood mst of relative distance measure[J]. Acta Electronica Sinica, 2020, 48(5): 937-945.
[3]
VafaeiN, RibeiroR A, Camarinha-MatosL M. Comparison of normalization techniques on data sets with outliers[J]. International Journal of Decision Support System Technology, 2022, 14(1): 1-17.
ZhangQian-qian, YuJiong, LiZi-yang, et al. Outlier detection algorithm based on affinity propagation[J]. Application Research of Computers, 2021, 38(6): 1662-1667.
JiangFeng, WangKai-li, YuXu, et al. A rough entropy-based approach to outlier detection and its application in unsupervised intrusion detection[J]. Control and Decision, 2020, 35(5): 1199-1204.
[8]
BelhadiA, DjenouriY, DjenouriD, et al. Deep learning versus traditional solutions for group trajectory outliers[J]. IEEE Transactions on Cybernetics, 2020, 52(6): 4508-4519.
YuanQing-jun, WangAn, WangYong-juan, et al. An improved template analysis method based on power traces preprocessing with manifold learning[J]. Journal of Electronics & Information Technology, 2020, 42(8): 1853-1861.
[11]
GhaniM U, RafiM, TahirM A. Discriminative adaptive sets for multi-label classification[J]. IEEE Access, 2020, 8: 227579-227595.
DengHong, LiuZhi-chao, PengYing-qiong, et al. The study on data preprocessing method based on fibonacci sampling[J]. Journal of Jiangxi Normal University (Natural Sciences Edition), 2021, 45(1): 60-66.
[14]
SripriyaT P, SrinivasanM R, GalloM. Robust distance measure to detect outliers for categorical data[J]. Soft Computing, 2020, 24(18): 1-8.
[15]
LiN, ZhaoX W, MuH L, et al. Research on the self-repairing model of outliers in energy data based on regional convergence[J]. Energies, 2020, 13(18): No.4909.
LiuYun, ZhengWen-feng, ZhangYi. Optimization of outlier data by fuzzy residual algorithm[J]. Journal of Chinese Computer Systems, 2021, 42(6): 1321-1326.
WangXi-te, ZhuZong-mei, YuXue-ping, et al. Parallel outlier detection algorithm in heterogeneous distributed environment[J]. Journal of Hunan University (Natural Sciences), 2020, 47(10): 100-110.
[20]
YangL, LuY Z, YangS X, et al. An evolutionary game based secure clustering protocol with fuzzy trust evaluation and outlier detection for wireless sensor networks[J]. IEEE Sensors Journal, 2021, 21(12): 13935-13947.
LinXue. Simulation of quick detection method for outliers in massive uncertain data sets[J]. Computer Simulation, 2021,38(6): 378-382.
[27]
MouretF, AlbughdadiM, DuthoitS, et al. Outlier detection at the parcel-level in wheat and rapeseed crops using multispectral and sar time series[J]. Remote Sensing, 2021, 13(5): No.956.
DongZe, JiaHao. Outlier detection method for thermal process data based on EWT-LOF[J]. Chinese Journal of Scientific Instrument, 2020, 41(2): 126-134.
[30]
Riahi-MadvarM, AziraniA A, NasersharifB, et al. A new density-based subspace selection method using mutual information for high dimensional outlier detection[J]. Knowledge-Based Systems, 2021, 216(2): No.106733.