上下文自编码框架下的科研热点挖掘方法
Research Hotspot Mining Under a Contextual Autoencoding Framework
高效挖掘科研热点及其对应作者是学术研究领域的重要任务。针对传统作者主题模型忽略上下文语义、难以融合外部知识及缺乏背景主题建模的问题,本文提出了一种基于上下文的神经作者主题模型。该模型利用 Transformer 捕捉文本的上下文语义以提升主题推断准确性,将单词与作者的预训练嵌入引入解码过程并利用 vMF 分布对主题进行建模以提升主题质量,同时采用狄利克雷树分布作为先验以区分背景主题与热点主题。此外,本文提出两个量化研究热点与作者关联程度的指标。本文在构建的计算语言学、计算机视觉和数据挖掘 3 个数据集上进行实验,结果表明,本模型在主题一致性、多样性及作者-主题关联性指标上均优于对比方法,充分验证了其在科研热点挖掘上的优越性。
Efficiently mining research hotspots and their corresponding authors is a critical task in academic research.To address the limitations of traditional author topic models,which often overlook contextual semantics,struggle to incorporate external knowledge, and fail to model background topics,this paper proposes a contextualized neural author topic model.The model utilizes Transformer to capture contextual semantics of text to improve the accuracy of topic inference,incorporates pre-trained word and author embeddings into the decoding process,and employs von Mises-Fisher distribution for topic modeling to improve topic quality.Meanwhile,it uses Dirichlet tree distribution as a prior to distinguish background topics from hotspot topics.Furthermore,the paper introduce two metrics to quantify the degree of association between research hotspots and authors.Experiments were conducted on three constructed datasets: Computational Linguistics,Computer Vision,and Data Mining.The results demonstrate that the model outperforms existing methods in topic coherence,diversity,and author-topic relevance,validating its superiority in mining research hotspots.
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
国家自然科学基金青华基金项目(62102192)
中国博士后科学基金面上项目(2022M710071)
江苏省双创博士人才项目(JSSCBS20210530)
中央高校基本科研业务费专项资金项目(aiia-24-01)
中央高校基本科研业务费专项资金项目(PA2025IISL0107)
/
| 〈 |
|
〉 |