College of Computer Science and Technology,Inner Mongolia Normal University,Hohhot 010022,China
Show less
文章历史+
Received
Accepted
Published
2024-09-25
Issue Date
2025-12-15
PDF (1044K)
摘要
现阶段教育领域命名实体识别面临标注数据缺乏,领域知识性强且知识语义复杂、实体分布不平衡等问题,导致现有模型对命名实体识别任务性能较低。因此提出适用于低资源教育场景下融合教育领域知识的命名实体识别模型LAP-BERT(label adversarial pointer-bidirectional encoder representation from transformer)。首先,将标签解释信息的语义作为教育领域知识融入文本中,解决课程文本数据特征复杂、样本少的问题;其次,结合对抗训练对词向量扰动生成对抗样本并集成为融合层输出,缓解实体分布不平衡问题;最后,采用基于跨度的方式进行解码,用于解决实体边界不唯一的问题。实验结果表明,相较于其他基线模型,LAP-BERT的F1值有所提升,这表明本文方法在低资源教育场景下的命名实体识别任务中具有优势。
Abstract
Currently, named entity recognition in the field of education faces challenges such as a lack of annotated data, intensive domain knowledge with complex semantics, and imbalanced entity distributions, which result in low performance of existing models for named entity recognition tasks. Therefore, this paper proposed label adversarial pointer-bidirectional encoder representation from transformer (LAP-BERT), a named entity recognition model integrating education domain knowledge which was suitable for low-resource education scenarios. Firstly, the semantics of label interpretation information were integrated into the text as educational domain knowledge to address the issues of complex data features and limited samples in course texts. Secondly, adversarial training was combined to generate adversarial samples through word vector perturbations and integrate them into the fusion layer output, alleviating imbalanced entity distributions. Finally, a span-based decoding approach was adopted to solve the problem of non-unique entity boundaries. Experimental results demonstrated that LAP-BERT exhibited an improved F1 score, compared to other baseline models, indicating the advantages of the proposed method for named entity recognition tasks in low-resource education scenarios.
与传统的基于规则和基于机器学习的方法相比,基于深度学习的方法不需要人工制定规则和提取特征,并且在性能上远优于传统方法。基于深度学习的方法中,循环神经网络(recurrent neural network,RNN)为命名实体识别任务常用的模型之一,但该结构存在梯度消失与梯度爆炸的问题[14],因此早期实体识别任务的结果并不理想。为解决RNN模型的不足,Hammerton[15]首次将长短期记忆网络(long short⁃term memory, LSTM)引入命名实体识别任务中,该模型通过添加输入门、遗忘门和输出门来判断数据的取舍。在LSTM的基础上,BiLSTM[16]被提出,其通过处理文本的上下文内容,增强了对关键实体的识别能力。此外,Lample等[17]将条件随机场模型(conditional random field, CRF)与神经网络模型结合,成为主流的命名实体识别方法。
Liu等[18]提出了将BiLSTM-CNN-CRF模型用于中国古代历史文化实体识别,通过使用连续词袋模型训练词向量,使用CNN提取句子中的字符表示向量,将字符表示向量和词向量拼接的结果作为BiLSTM的输入,并采用CRF选择最佳标注序列,以获得最后识别的实体信息,但该模型需要平衡CNN捕捉的局部特征和BiLSTM捕捉的长距离依赖问题,这可能需要精心设计的网络结构和训练策略。此外,上述方法由于主要集中在词、字或词之间的特征提取上,忽略了语境的上下文从而不能表示多义性。针对以上问题,有研究者将基于Transformer的BERT(bidirectional encoder representations from transformers)引入命名实体识别任务,解决一次多义问题。Wei等[19]提出了一种基于BERT的教育应急领域命名实体识别方法BERT-BiLSTM-CRF,通过在教育紧急情况语料库上训练BERT以获得单词的矢量化表示,再使用BiLSTM获得序列化文本的上下文编码,最后通过CRF对序列进行解码和注释,以获得教育紧急情况中的相应实体。Li等[20]提出EduBERT-BiLSTM-CRF模型,通过微调BERT模型自适应地捕获教育领域的有效信息,并结合BiLSTM-CRF有效识别教育实体。上述方法虽然取得较好效果,但对数据规模要求较高,并不适用于低资源场景下的命名实体识别任务,并且以上模型未考虑教育领域中数据复杂性和知识分布特点导致的实体分布不均衡问题。此外,由于语义的复杂性,教育领域数据还存在非结构化性强、实体边界不唯一的特点。
综上所述,为探索一种适用于低资源场景下教育领域命名实体识别方法,解决课程文本领域性强、语义复杂以及实体分布不平衡等问题,需提出一个融合标签解释信息和对抗训练模型(label adversarial pointer-bidirectional encoder representation from transformer,LAP-BERT),并且采用基于跨度的方式进行解码,用于解决实体边界不唯一的问题。
CAOP F, CHENY B, LIUK, et al. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018: 182-192.
[5]
LIJ, SUNA X, HANJ L, et al. A survey on deep learning for named entity recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(1): 50-70.
[6]
TRIEUH L, MIWAM, ANANIADOUS. Named entity recognition for cancer immunology research using distant supervision[C]//Proceedings of the 21st Workshop on Biomedical Language Processing. Dublin: Association for Computational Linguistics, 2022: 171-177.
LIY L. Research and application of deep learning in image recognition[C]//2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA). Shenyang: IEEE, 2022: 994-999.
LAURIOLAI, LAVELLIA, AIOLLIF. An introduction to deep learning in natural language processing: Models, techniques, and tools[J]. Neurocomputing, 2022, 470: 443-456.
[15]
HAMMERTONJ.Named entity recognition with long short-term memory[C]//Proceedings of the seventh conference on Natural language learning at HLT-NAAC L 2003.Edmonton: ACL, 2003:172-175.
[16]
HEW, XUY, YUQ. BERT-BiLSTM-CRF Chinese Resume Named Entity Recognition Combining Attention Mechanisms[C]//Proceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering.Dalian: ACM, 2023: 542-547.
[17]
LAMPLEG, BALLESTEROSM, SUBRAMANIANS,et al.Neural architectures for named entity recognition[J].arXiv preprint arXiv,2016:1603.01360.
[18]
LIUY, WEIS, HUANGH, et al. Naming entity recognition of citrus pests and diseases based on the BERT-BiLSTM-CRF model[J]. Expert Systems with Applications, 2023, 234: 121103.
[19]
WEIK, WENnB. Named entity recognition method for educational emergency field based on BERT[C]//2021 IEEE 12th International Conference on Software Engineering and Service Science (ICSESS). Beijing: IEEE, 2021: 145-149.
[20]
LIN, SHENQ, SONGR, et al. MEduKG: A deep-learning-based approach for multi-modal educational knowledge graph construction[J]. Information, 2022, 13(2): 91.
YANGL, SHAMIA. On hyperparameter optimization of machine learning algorithms: Theory and practice[J]. Neurocomputing, 2020, 415: 295-316.
[24]
ZHAOW, LIUJ. Application of knowledge map based on BiLSTM-CRF algorithm model in ideological and political education question answering system[J]. Mobile Information Systems, 2022, 2022:4139323.
[25]
LIUS, YANGH, LIJ Y, et al. Preliminary study on the knowledge graph construction of Chinese ancient history and culture[J]. Information, 2020, 11(4): 186.
[26]
ANY, XIAX Y, CHENX L, et al. Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF[J]. Artificial Intelligence in Medicine, 2022, 127: 102282.
[27]
WUS, SONGX N, FENGZ H, et al. NFLAT: Non-flat-lattice transformer for Chinese named entity recognition[J]. arXiv preprint arXiv, 2022: 2205.05832.