PDF (4651K)
摘要
提出了一种模糊边界剥离聚类(fuzzy border-peeling clustering, FBP)算法。 首先,采用了一种基于 Cauchy 核的动态密度估计方式来计算数据点密度;然后,使用逐层剥离策略区分边界数据和核心数据;接着,利用核心数据间的可达性实现核心区域聚类;最后,采用模糊分配策略实现边界数据的软划分。 在人工数据集和真实数据集上与 10 种算法(包含 6 种密度聚类算法和 4 种模糊聚类算法)作了对比。 实验结果表明,在所有数据集上,FBP 的调整兰德系数 ARI 指标平均提高了 21% ~ 60%,FBP 的标准化互信息 NMI 指标平均提升了 12%~47%,基于 Cauchy 核和模糊分配策略优化后的边界剥离聚类算法显著提高了聚类的准确性。
Abstract
A fuzzy border-peeling clustering (FBP) algorithm is proposed. First, a density estimation method based on Cauchy kernel is used to calculate the densities of data points. Secondly, the boundary data are separated from the core data using the layer-by-layer peeling strategy. Thirdly, the reachability between the core data is used to achieve the core region clustering. Finally, a fuzzy assignment strategy is used to achieve the soft partitioning of the boundary data. A comparison is made between the fuzzy border-peeling clustering and 10 benchmark algorithms, including 6 density-based clustering algorithms and 4 fuzzy clustering algorithms, on artificial and real-world datasets. The experimental results show that on all datasets, FBP has the ARI (adjusted rand index) increased by 21% to 60% on average, and FBP has the NMI (normalized mutual information) increased by 12% to 47% on average. The border-peeling clustering algorithm optimized based on Cauchy kernel and fuzzy assignment strategy significantly improves the accuracy of clustering.
关键词
Key words
[Author(id=1279801050807321519, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=sunjr@jsnu.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1279801050870236081, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, authorId=1279801050807321519, language=EN, stringName=Jiarui SUN, firstName=Jiarui, middleName=null, lastName=SUN, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=School of Computer Science and Technology, Jiangsu Normal University , Xuzhou 221100, Jiangsu, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1279801050924762034, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, authorId=1279801050807321519, language=CN, stringName=孙嘉睿, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=江苏师范大学 计算机科学与技术学院 , 江苏 徐州 221100, bio={"content":"孙嘉睿(1997—),男,硕士研究生,研究方向为机器学习和数据挖掘. E-mail: sunjr@jsnu.edu.cn
"}, bioImg=null, bioContent=孙嘉睿(1997—),男,硕士研究生,研究方向为机器学习和数据挖掘. E-mail: sunjr@jsnu.edu.cn
, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1279801050719241131, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, xref=null, ext=[AuthorCompanyExt(id=1279801050736018348, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, companyId=1279801050719241131, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Computer Science and Technology, Jiangsu Normal University , Xuzhou 221100, Jiangsu, China), AuthorCompanyExt(id=1279801050756989869, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, companyId=1279801050719241131, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=江苏师范大学 计算机科学与技术学院 , 江苏 徐州 221100)])]), Author(id=1279801050979287988, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=dumj@jsnu.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1279801051046396854, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, authorId=1279801050979287988, language=EN, stringName=Mingjing DU, firstName=Mingjing, middleName=null, lastName=DU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=*, address=School of Computer Science and Technology, Jiangsu Normal University , Xuzhou 221100, Jiangsu, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1279801051096728503, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, authorId=1279801050979287988, language=CN, stringName=杜明晶, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=*, address=江苏师范大学 计算机科学与技术学院 , 江苏 徐州 221100, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1279801050719241131, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, xref=null, ext=[AuthorCompanyExt(id=1279801050736018348, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, companyId=1279801050719241131, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Computer Science and Technology, Jiangsu Normal University , Xuzhou 221100, Jiangsu, China), AuthorCompanyExt(id=1279801050756989869, tenantId=1045748351789510663, journalId=1155139928303341749, articleId=1279771240475459605, companyId=1279801050719241131, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=江苏师范大学 计算机科学与技术学院 , 江苏 徐州 221100)])])]
孙嘉睿,杜明晶.
模糊边界剥离聚类[J].
山东大学学报(理学版), 2024, 59(03): 27-36 DOI:10.6040/j.issn.1671-9352.4.2023.040
| [1] |
徐晓, 丁世飞, 丁玲. 密度峰值聚类算法研究进展[J]. 软件学报, 2022, 33(5): 1800-1816.
|
| [2] |
XU Xiao, DING Shifei, DING Ling. Survey on density peaks clustering algorithm[J]. Journal of Software, 2022, 33(5): 1800-1816.
|
| [3] |
PENG D, GUI Z, WANG D, et al. Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity[J]. Nature Communications, 2022, 13(1): 5455.
|
| [4] |
ESTER M, KRIEGEL H P, SANDER J, et al. A density—based algorithm for discovering clusters in large spatial databases with noise[C] // Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Menlo Park: AAAI Press, 1996: 226-231.
|
| [5] |
CAMPELLO R J, MOULAVI D, SANDER J. Density—based clustering based on hierarchical density estimates[C] // Proceedings of the 17th Pacific—Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer, 2013: 160-172.
|
| [6] |
RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496.
|
| [7] |
CHEN H, LIANG M, LIU W, et al. An approach to boundary detection for 3D point clouds based on dbscan clustering[J]. Pattern Recognition, 2022, 124: 108431.
|
| [8] |
OUYANG T, PEDRYCZ W, PIZZIN J. Rule—based modeling with dbscan—based information granules[J]. IEEE Transactions on Cybernetics, 2021, 51(7): 3653-3663.
|
| [9] |
LU J, ZHAO Y, TAN K L, et al. Distributed density peaks clustering revisited[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(8): 3714-3726.
|
| [10] |
RASOOL Z, ZHOU R, CHEN L, et al. Index—based solutions for efficient density peak clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(5): 2212-2226.
|
| [11] |
盛锦超, 杜明晶, 李宇蕊, 等 . 结合柯西核的分类型数据密度峰值聚类算法[J]. 计算机工程与应用, 2022, 58(18): 162-171.
|
| [12] |
SHENG Jingchao, DU Mingjing, LI Yurui, et al. Cauchy kernel—based density peaks clustering algorithm for categorical data[J]. Computer Engineering and Applications, 2022, 58(18): 162-171.
|
| [13] |
AVERBUCH—ELOR H, BAR N, COHEN—OR D. Border—peeling clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(7): 1791-1797.
|
| [14] |
ANKERST M, BREUNIG M M, KRIEGEL H P, et al. Optics: ordering points to identify the clustering structure[C] // Proceedings of the 1999 International Conference on Management of Data. New York: ACM, 1999: 49-60.
|
| [15] |
GUO W, WANG W, ZHAO S, et al. Density peak clustering with connectivity estimation[J]. Knowledge—Based Systems, 2022, 243: 108501.
|
| [16] |
WANG Y, WANG D, ZHOU Y, et al. VDPC: variational density peak clustering algorithm[J]. Information Sciences, 2023, 621: 627-651.
|
| [17] |
DU M, WANG R, JI R, et al. Robparobust border—peeling clustering using Cauchy kernel[J]. Information Sciences, 2021, 571: 375-400.
|
| [18] |
陈延伟, 赵兴旺. 基于边界点检测的变密度聚类算法[J]. 计算机应用, 2022, 42(8): 2450-2460.
|
| [19] |
CHEN Yanwei, ZHAO Xingwang. Varied density clustering algorithm based on border point detection[J]. Journal of Computer Applications, 2022, 42(8): 2450-2460.
|
| [20] |
张柏恺, 杨德刚, 冯骥. 一种去除聚类数量 k 和邻域参数 c 设置的自适应聚类算法[J]. 计算机工程与科学, 2021, 43(10): 1838-1847.
|
| [21] |
ZHANG Bokai, YANG Degang, FENG Ji. A self—adaptive clustering algorithm without neighborhood parameter k and cluster number c[J]. Computer Engineering & Science, 2021, 43(10): 1838-1847.
|
| [22] |
DUNN J C. A fuzzy relative of the isodata process and its use in detecting compact well—separated clusters[J]. Journal of Cybernetics, 1973, 3: 32-57.
|
| [23] |
KARAYIANNIS N B. Meca: maximum entropy clustering algorithm[C] // Proceedings of 3rd IEEE International Conference on Fuzzy Systems. Piscataway: IEEE, 1994: 630-635.
|
| [24] |
GAN G, WU J, YANG Z. A fuzzy subspace algorithm for clustering high dimensional data[C] // Proceedings of 2nd International Conference on Advanced Data Mining and Applications. Berlin: Springer, 2006: 271-278.
|
| [25] |
FAZENDEIRO P, DE OLIVEIRA J V. Observer—biased fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 2014, 23(1): 85-97.
|
| [26] |
KRISHNAPURAM R, KELLER J M. The possibilistic c—means algorithm: insights and recommendations[J]. IEEE Transactions on Fuzzy Systems, 1996, 4(3): 385-393.
|
| [27] |
ZHANG D Q, CHEN S C. Clustering incomplete data using kernel—based fuzzy c—means algorithm[J]. Neural Processing Letters, 2003, 18(3): 155-162.
|
| [28] |
HUANG H C, CHUANG Y Y, CHEN C S. Multiple kernel fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 2011, 20(1): 120-134.
|
| [29] |
DING S, DU M, SUN T, et al. An entropy—based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood[J]. Knowledge—Based Systems, 2017, 133: 294-313.
|
基金资助
国家自然科学基金资助项目(62006104)
江苏师范大学研究生科研创新项目(2022XKT1528)