Self‑representation based unsupervised feature selection can handle unlabeled data without being affected by pseudo‑labeling. To ensure that such methods simultaneously achieve good robustness, preserve the local structure of samples, and select the most representative features, a new approach is proposed, and a corresponding iterative optimization algorithm is designed to compute its objective function. The method first identifies and processes outliers of samples, then combines the traditional self‑representation model with non‑convex sparse constraint and manifold regularization to form the target model, and puts the preprocessed data into the model for feature selection. Finally, the method uses the selected features for clustering. The proposed method is compared with seven methods on nine real data sets for experiments, and the experimental results show that the proposed method can effectively solve the unsupervised feature selection problem.
其中:是基于余弦相似的相似性矩阵[11]; D 是一个对角元素为的对角矩阵;是拉普拉斯矩阵.将RL与式(4)结合得到最终的优化目标模型,其中,是调节参数.
2.3 优化算法
式(8)中的非凸目标函数很难直接求解,但代入式(5)后,可以将其转换为如下两个凸函数相减的模型:
这种凸函数相减函数(difference of convex functions,DC)可以通过收敛的凹凸过程[17](concave‑convex procedure,CCCP)来求解,下面简单介绍其步骤.设为可导的DC函数,存在凸函数和使.目标函数为,其中.设是可微的,CCCP本质上是求解这样的一组迭代子问题[18]:,其中为迭代次数.
参考上述求解过程,式(9)可被转换为如下一系列凸子问题:
其中 .根据式(10),需要解决的问题可简化为
式(11)中的优化问题是非光滑的,基于交替乘子法(alternating direction method of multipliers,ADMM)[19]的思想提出解决该问题的算法.首先,引入辅助变量,将问题(11)转化为如下等式约束优化问题:
表2和表3展示了在不同数据集上所有算法在其最优参数下的聚类结果的平均值和标准差.其中最佳的结果用粗体标记,次佳的结果用下划线标记,括号里是所选特征的数量.通过分析表2和表3,可以得出以下结论:① SSRMR总体上优于NOVRSR.验证了保存样本间的局部结构是有助于特征选择的; ② 在大多数数据集中,SSRMR相较对比方法表现更好,证明了其有效性.在某些情况下,SSRMR的结果不是最佳的,但也和最佳结果接近.此外,SSRMR在某些情况下有显著的改善,例如在nci9上,其ACC和NMI分别比次佳的结果高6%左右,在warpPIE10P上的NMI比次佳的结果高11%; ③ 基线方法和UFS方法的比较说明了特征选择的有效性.基线方法总体表现并不差,但它需要所有原始特征,当数据的维度很高时使用此法并不是一个明智的选择.相比之下,UFS方法只需要很少的特征就可以获得类似甚至更好的结果.
HeX F, CaiD, NiyogiP.Laplacian score for feature selection[C]// Proceedings of the 18th International Conference on Neural Information Processing Systems.Vancouver,2005:507-514.
[3]
YangY, ShenH T, MaZ G,et al .L 2,1‑norm regularized discriminative feature selection for unsupervised learning[C]// Proceedings of the Twenty‑Second International Joint Conference on Artificial Intelligence-Volume Two.Menlo Park:AAAI Press,2011:1589-1594.
[4]
ZhaoZ, LiuH.Spectral feature selection for supervised and unsupervised learning[C]// Proceedings of the 24th International Conference on Machine Learning.Corvalis,2007:1151-1157.
[5]
CaiD, ZhangC, HeX F.Unsupervised feature selection for multi‑cluster data[C]// Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington DC,2010:333-342.
LiZhan‑shaan, LiuZhao‑geng, YuYin,et al.A quantized pheromone ant colony optimization algorithm for feature selection[J].Journal of Northeastern University(Natural Science),2020,41(1):17-22.
[8]
HuR Y, ZhuX F, ChengD B,et al.Graph self‑representation method for unsupervised feature selection[J].Neurocomputing,2017,220:130-137.
[9]
NieF P, ZhuW, LiX L.Unsupervised feature selection with structured graph optimization[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.Phoenix,2016:1302-1308.
LiuX W, WangL, ZhangJ,et al.Global and local structure preservation for feature selection[J].IEEE Transactions on Neural Networks and Learning Systems,2013,25(6):1083-1095.
[12]
LiW Y, ChenH M, LiT R,et al.Unsupervised feature selection via self‑paced learning and low‑redundant regularization[J].Knowledge‑Based Systems,2022,240:108150.
[13]
ParsaM, ZareH, GhateeM.Unsupervised feature selection based on adaptive similarity learning and subspace clustering[J].Engineering Applications of Artificial Intelligence,2020,95:103855.
MiaoJ Y, PingY, ChenZ S,et al.Unsupervised feature selection by non‑convex regularized self‑representation[J].Expert Systems with Applications,2021,173:114643.
[16]
ShiY, MiaoJ Y, WangZ Y,et al.Feature selection with l 2,1 - 2 regularization[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(10):4967-4982.
[17]
BottmerL, CrouxC, WilmsI.Sparse regression for large data sets with outliers[J].European Journal of Operational Research,2022,297(2):782-794.
SriperumbudurB, LanckrietG.On the convergence of the concave-convex procedure[C]// Proceedings of the 22nd International Conference on Neural Information Processing Systems.Vancouver,2009:1759-1767.
[20]
BoydS, ParikhN, ChuE,et al.Distributed optimization and statistical learning via the alternating direction method of multipliers[J].Foundations and Trends® in Machine Learning,2011,3(1):1-122.