The data-to-text generation technology was applied to the evaluation system for chalk characters writing standard in the paper. The baseline model was improved firstly and then the improved baseline model was used to model input sequences by using bidirectional gated recurrent unit (BiGRU) and generated a subset of input items to be covered for each sentence using a GRU during the grouping planning phase. A multi-head self-attention mechanism was introduced before data grouping planning to capture key-value relationships more finely in terms of the problem that direct splicing of BiGRU state information might not adequately capture the complex relationships between key-value pairs. The tests on handwritten Chinese characters data sets demonstrated that the proposed method achieved 0.68, 0.75 and 0.67 in BLEU-4, ROUGE and METEOR, respectively, which was of great practical application value for the automatic evaluation of the handwriting standard of chalk characters.
在师范生的专业素养和能力评价标准中,粉笔字的书写规范性及排版整齐性一直是重要衡量标准[1]。通过对粉笔字书写规范的评价,可以及时纠正师范生的书写问题,提升其书写能力。粉笔字书写规范性自动评价往往采用图像到文本生成(image-to-text generation)和数据到文本生成(data-to-text generation)两种方式。两者都是自然语言生成任务(natural language generation,NLG)领域中的重要子任务。图像到文本生成方法旨在将图像转化为自然语言描述。该任务涉及图像处理(image processing)、计算机视觉(computer vision)和自然语言处理(natural language processing,NLP)等多领域知识。随着深度学习网络的广泛应用,研究者们提出了多模态递归神经网络模型(m-RNN)和将深度卷积神经网络(deep convolutional neural network,DCNN)与长短时记忆网络(long short-term memory,LSTM)相结合的方法,用于图像内容的语义描述与理解[2]。这种方法存在一些缺陷,输入图像与输出文本之间存在语义不匹配问题,并且编码和解码网络之间的异构性导致全部图像信息难以被提取。另一种常见的方法是数据到文本生成,相较于图像到文本生成方法,数据到文本生成方法更加直接,可以避免图像到文本生成中的语义鸿沟问题[2]。通过对结构化[3⁃4]或非结构化[5]的数据进行理解和分析,可将其转化为一段流畅的、真实的描述性文本。在粉笔字书写规范性评价任务中,关键问题在于如何自动生成准确且无歧义的评价语句,这对图像信息的提取要求非常严格。考虑到课题组已经对粉笔字图片进行了特征提取,因此图像到文本生成技术在本任务中不再适用。基于此,综合两种方法的特征和任务需求,最终选择数据到文本生成技术来实现粉笔字书写规范性评价。
早期的数据到文本生成研究主要采用模板的方法,通过定义模板12
1 DUBOUE P A,MCKEOWN K R. Statistical acquisition of content selection rules for natural language generation[C]//Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics,2003:121⁃128.
生成文本,生成速度较快,可控性较强。然而,需要手动编写规则和模板,难以扩展到其他领域,生成的文本缺乏多样性。随着神经网络技术的发展,基于神经网络序列生成的方法逐渐成为主流。神经网络序列的生成方法可以端到端自动学习输入数据到输出文本的映射,无需人工构建模板和规则,避免领域依赖性,更易扩展到其他应用。但训练过程依赖大量标注语料,结果也容易受数据集影响。Mei等[6]提出了一种端到端(end-to-end)基于编码器⁃解码器(encoder-decoder)框架的神经网络模型,利用双向长短时记忆网络(bi-directional long short-term memory,BiLSTM)对数据进行编码。Lebret等[7]提出了基于条件神经语言模型(conditional neural network models)的神经模型,用于根据维基百科人物传记的事实表格生成人物传记文本。Li等①在编码器⁃解码器的框架上引入延迟复制机制,先生成文本模板,再填入具体数值。Chen等[8]采用外部知识来增强数据到文本模型,提高生成文本的真实性。Puduppully等[9]提出通过训练数据归纳出宏观计划,并将其反馈到文本生成阶段。
本次粉笔字书写规范性评价任务采用了三种自动评价指标:BLEU(bilingual evaluation understudy)[15]、ROUGE(recall-oriented understudy for gisting evaluation)①和METEOR(metric for evaluation of translation with explicit ordering)②。BLEU-4衡量生成文本与参考文本的相似度,得分越高,表明模型输出的文本越接近于参考文本。ROUGE通过计算生成文本与参考文本之间的重叠单词数来评价文本质量。METEOR结合n-gram匹配和词根、同义词等词汇级别的匹配,评价文本的准确度和流畅度,并引入语义相关性考虑生成文本与参考文本之间的语义精度。以下为三种指标的计算公式。
,
其中,为权重,为n-gram的精确率,BP为惩罚因子。
① LIN C Y. Rouge: A package for automatic evaluation of summaries[C]//Text summarization branches out. Stroudsburg:Association for Computational Linguistics,2004:74-81.
② BANERJEE S, LAVIE A. METEOR:An automatic metric for MT evaluation with improved correlation with human judgments[C]// Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Stroudsburg:Association for Computational Linguistics,2005:65⁃72.
① KIDDON C, ZETTLEMOYER L, CHOI Y. Globally coherent text generation with neural checklist models[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics,2016:329⁃339.
SHAL, MOUL, LIUT, et al. Order-planning neural text generation from structured data[C]// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto:AAAI Press,2018:5414-5421.
[5]
杨璟雅. 基于上下文感知的生物文献图文本生成研究[D]. 武汉:武汉理工大学,2021.
[6]
MEIH, BANSALM, WALTERM R. What to talk about and how? Selective generation using LSTMs with coarse-to-fine alignment[J]. arXiv preprint arXiv,2015:1509.00838.
[7]
LEBRETR, GRANGIERD, AULIM. Neural text generation from structured data with application to the biography domain[J]. arXiv preprint arXiv,2016:1603.07771.
[8]
CHENW, SUY, YANX, et al. KGPT: Knowledge-grounded pre-training for data-to-text generation[J]. arXiv preprint arXiv,2020:2010.02307.
[9]
PUDUPPULLYR, LLAPATAM. Data-to-text Generation with Macro Planning[J]. Transactions of the Association for Computational Linguistics, 2021,9:510-527.
[10]
CHENGJ, LID, LAPATAM. Long short-term memory-networks for machine reading[J]. arXiv preprint arXiv,2016:1601.06733.
[11]
SHAOZ, HUANGM, WENJ, et al. Long and diverse text generation with planning-based hierarchical variational model[J]. arXiv preprint arXiv,2019:1908.06605.
[12]
VASWANIA, SHAZEERN, PARMARN, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30:5998-6008.
PAPINENIK, ROUKOSS, WARDT, et al. BLEU : A method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. New York:ACM,2002:311-318.
[16]
LIUT, WANGK, SHAL, et al. Table-to-text generation by structure-aware seq2seq learning[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press,2018:4881-4888.
[17]
MAW, NIZ, CAOK, et al. Seq2Tree: A tree-structured extension of LSTM network[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. New York:Curran Associates,2017:1-5.
[18]
ZHAOT, ZHAOR, ESKENAZIM. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders[J]. arXiv preprint arXiv,2017:1703.10960.
[19]
VINYALSO, FORYUNATOM, JAITLYN.Pointer networks[J]. Advances in Neural Information Processing Systems, 2015, 28:2692-2700.