School of Computer Science and Software Engineering,Southwest Petroleum University,Chengdu,610500,China
Show less
文章历史+
Received
Accepted
Published
2025-10-21
Issue Date
2026-04-30
PDF (1467K)
摘要
大规模图上的频繁模式挖掘(Frequent Pattern Mining,FPM)因其在社交分析等各类应用中的广泛应用,受到高度关注.受模式语义的约束,传统的FPM技术难以满足多样化的数据分析需求,基于此,提出有参模式(Parameterized Patterns,p⁃模式)及其挖掘技术.通过在模式中引入参数来拓展匹配语义,实现图中复杂关系的有效捕获;设计并实现了高效的挖掘算法(PMiner),用于从大图中发现频繁p⁃模式;提出基于p⁃模式的图关联规则(Graph Association Rule,GAR),并设计GAR挖掘算法GARGen,以发现图中节点间的潜在关联.在真实图数据上的实验不仅验证了上述算法的运行效率,还揭示了p⁃模式与传统模式的差异以及定义在p⁃模式上的GAR在链接预测等任务上的有效性.
Abstract
Frequent Pattern Mining (FPM) on large⁃scale graphs has garnered significant attention due to its broad applications in areas such as social network analysis. However,constrained by traditional pattern semantics,conventional FPM techniques struggle to meet the diverse demands of data analysis. To address this challenge,this paper introduces Parameterized Patterns (p⁃patterns) and their mining framework. By incorporating parameters into patterns,the matching semantics are extended,enabling the effective capture of complex relationships within graphs. An efficient mining algorithm,PMiner,is designed and implemented to discover frequent p⁃patterns from large graphs. Furthermore,this paper proposes Graph Association Rules (GAR) based on p⁃patterns and designs the GARGen algorithm to uncover latent associations between nodes. Experiments on real⁃world graph datasets not only validate the computational efficiency of the proposed algorithms but also highlight the distinctions between p⁃patterns and traditional patterns,as well as the effectiveness of GAR in tasks such as link prediction.
近年来,单一大图上的FPM引起了广泛关注,催生出众多先进的挖掘算法.值得注意的是,Elseidy et al[3]将FPM重新定义为约束满足问题,设计了GraMi算法,特别地,该算法利用具有反单调性的最小图像作为支持度度量来评估模式的频率.由于GraMi专为单一大图上的挖掘任务设计,因而面临计算资源消耗大的问题.为了提高挖掘效率,ScaleMine[4]采用了两阶段挖掘方法,首先通过近似处理快速识别频繁模式,随后通过并行挖掘对剩余模式进行挖掘,高效地解决FPM问题.为了进一步提升FPM的挖掘效率,引入并行处理技术,如GraphINC[5],TopKWY[6]和Peregrine[7]等算法通过与分布式系统结合显著提高了挖掘效率.然而,分布式系统自身也带来了复杂和高成本的问题.为此,Yuan et al[8]的T⁃FSM算法通过加载任务驱动的执行引擎以及多种优化技术的结合,不仅实现了较低的运行时间和内存消耗,也实现了高效的负载平衡.汤小春等[9]提出基于数据流的FPM算法,通过“微批”模式和优化的正规编码计算提高了并行性,且能够处理大规模图数据,然而,大量的子图实例的存储给内存管理带来了挑战.针对此问题,Cheng et al[10]的可扩展频繁模式挖掘算法通过引入数据间的关联性及内存管理优化策略,有效解决了大数据探索中的可扩展性问题,使算法能在内存有限的情况下高效运行.为了适应不同的应用场景对算法进行扩展,挖掘目标也从通用模式转变成特定模式.例如郭方方等[11]针对动态污点分析中存在的效率低下的问题,设计了SPIN⁃MGM算法来挖掘最大频繁子图,将生成树与最大频繁子图挖掘结合,显著提升了识别效率.随着图数据规模的不断增长,传统的挖掘算法已经难以满足需求.为了应对多重图上的挖掘挑战,Ingalalli et al[12]提出了MuGram算法,不仅能在多重图上发现频繁多重图模式,也能满足普通图上的挖掘需求.针对加权图上的挖掘问题,Islam et al[13]的WFSM⁃MaxPWS框架引入了MaxPWS剪枝技术,实现了在动、静态权重图中加权频繁子图的高效挖掘.Min et al[14]提出一个新的频繁模式发现算法,将字母表分为强、中、弱三个组件来识别一种新型的模式,即tri⁃模式.Wu et al[15]提出基于序列模式的高平均效用不重叠序列模式(HANP),实验证明,和现有的频繁序列模式相比,HANP具有更高的挖掘价值.这些特定模式大大增强了频繁模式挖掘的语义丰富性,能够更好地捕捉数据中的潜在关系.截至目前,尚未发现可直接用于挖掘p⁃模式的算法,但前述的频繁模式挖掘算法为研究p⁃模式挖掘问题提供了宝贵的启示和思路.
1.2 关联规则挖掘及应用
关联规则挖掘旨在识别数据集中项集间的关联关系,定义了面向交易数据的关联规则,其中,最具代表性的关联规则挖掘算法是Apriori算法[10].随着图数据的广泛使用,面向图数据的关联分析受到越来越多的关注.在早期的社交网络关联分析中[16],主要是基于社交图中提取的属性元组来挖掘传统规则和霍恩规则,没有充分考虑规则的结构特性.Fan et al[2]将关联规则从项集拓展到图模式,引入了特殊类型的图关联规则(Graph Pattern Association Rule,GPAR)用于开展社交媒体营销等,而GPAR中的后继被定义为包含单边的特殊模式图.在此基础上,Wang et al[16]进一步提出代表性的GPAR,形式化了GPAR挖掘问题并将其分为两个子问题,即频繁模式挖掘和规则生成,高效地解决了关联规则挖掘中的挑战.此外,Liu et al[17]拓展了传统GPAR的语义,提出GROs (Graph Rule with Oracles),通过引入内部计算与外部知识集成机制以及基于枢轴双模拟的匹配方式,解决了传统图规则在表达能力与计算效率上的瓶颈.受益于GPAR挖掘问题的有效突破,大图上的链接预测问题也得到了有效解决.链接预测旨在推断网络结构中节点间存在关联的可能性,在社交推荐等场景中至关重要,传统方法大多通过节点特征相似性度量来实现[18].近年来提出了一系列链接预测技术,其中,Benhidour et al[19]提出面向有向网络的链接预测法,包括相似度⁃流行度算法、路径探索算法以及混合算法,填补了有向网络上链接预测问题的研究空白.然而,这类算法依然面临计算复杂度高的挑战.此外,焦鹏飞等[18]提出一种基于多视图对比学习的动态网络链接预测方法,实现了动态网络的表示学习和链接预测.
本文通过引入p⁃模式扩展了传统模式的语义,并进一步将p⁃模式融入图关联规则(Graph Association Rule,GAR),丰富了GAR的表达能力,强化了其在链接预测上的作用.
DaudN N, Ab HamidS H, SaadoonM,et al. Applications of link prediction in social networks:A review. Journal of Network and Computer Applications,2020,166:102716.
[2]
FanW F, WangX, WuY H,et al. Association rules with graph patterns. Proceedings of the VLDB Endowment,2015,8(12):1502-1513.
[3]
ElseidyM, AbdelhamidE, SkiadopoulosS,et al. GraMi:Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment,2014,7(7):517-528.
[4]
AbdelhamidE, AbdelazizI, KalnisP,et al. ScaleMine:Scalable parallel frequent subgraph mining in a single large graph∥Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis. Salt Lake City,UT,USA:IEEE,2016:716-727.
[5]
HusseinR, LernerA, RyserA,et al. GraphINC:Graph pattern mining at network speed. Proceedings of the ACM on Management of Data,2023,1(2):1-28.
[6]
Jyoti, KailasamS, BuzmakovA. A scalable,distributed framework for significant subgroup discovery. Knowledge⁃Based Systems,2024,284:111335.
[7]
JamshidiK, MahadasaR, VoraK. Peregrine:A pattern⁃aware graph mining system∥Proceedings of the 15th European Conference on Computer Systems. Heraklion,Greece:Association for Computing Machinery,2020:1-16.
[8]
YuanL H, YanD, QuW W,et al. T⁃FSM:A task⁃based system for massively parallel frequent subgraph pattern mining from a big graph. Proceedings of the ACM on Management of Data,2023,1(1):1-26.
BenhidourH, AlmeshkhasL, KerracheS. Link prediction in directed complex networks:Combining similarity⁃popularity and path patterns mining. Applied Intelligence,2024,54(17):8634-8665.
[20]
WangX, LanZ, HeY A,et al. A cost⁃effective approach for mining near⁃optimal top⁃k patterns. Expert Systems with Applications,2022,202:117262.
[21]
LinP, SongQ, ShenJ L,et al. Discovering graph patterns for fact checking in knowledge graphs∥Database Systems for Advanced Applications. Cham:Springer,2018:783-801.
[22]
Diaz⁃GarciaJ A, RuizM D, Martin⁃BautistaM J. A survey on the use of association rules mining techniques in textual social media. Artificial Intelligence Review,2023,56(2):1175-1200.
[23]
GongN Z, XuW C, HuangL,et al. Evolution of social⁃attribute networks:Measurements,modeling,and implications using Google+∥Proceedings of the 2012 Internet Measurement Conference.New York,NY,USA:Association for Computing Machinery,2012:131-144.
[24]
LeskovecJ, McauleyJ. Learning to discover social circles in ego networks. Advances in Neural Information Processing Systems,2012,1:539-547.
[25]
TakacL, ZabovskyM. Data analysis in public social networks∥International Scientific Conference and International Workshop Present Day Trends of Innovations 2012. Łomża:[ s.n],2012:1-6.
[26]
RossiR A, AhmedN K. The network data repository with interactive graph analytics and visualization∥Proceedings of the AAAI Conference on Artificial Intelligence. Austin,TX,USA:AAAI Press,2015:4292-4293.
[27]
LüL Y, ZhouT. Link prediction in complex networks:a survey. Physica A:Statistical Mechanics and its Applications,2011,390(6):1150-1170.
[28]
JehG, WidomJ. SimRank:A measure of structural⁃context similarity∥Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York,NY,USA:Association for Computing Machinery,2002:538-543.
[29]
GalárragaL A, TeflioudiC, HoseK,et al. AMIE:Association rule mining under incomplete evidence in ontological knowledge bases∥Proceedings of the 22nd International Conference on World Wide Web. New York,NY,USA:Association for Computing Machinery,2013:413-422.
[30]
BayardoR J, AgrawalR. Mining the most interesting rules∥Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York,NY,USA:Association for Computing Machinery,1999:145-154.
[31]
BrinS, MotwaniR, SilversteinC. Beyond market baskets:Generalizing association rules to correlations∥Proceedings of the 1997 ACM SIGMOD International Conference on Management of data. New York,NY,USA:Association for Computing Machinery,1997:265-276.