Different from previous researches that used bilingual corpus as TM and source-end similarity search for memory retrieval, a new NMT framework was proposed, which used monolingual translation memory and performed learnable retrieval in a cross-language way. Monolingual translation memory was the use of target language sentences as TM. This framework had certain advantages: firstly, the cross-language memory network allowed monolingual data to be used as TM; secondly, the cross-language memory network and NMT model was jointly optimized for the ultimate translation goal, thus realizing integrated training. Experiments show that the proposed method achieved good results in four translation tasks, and the model also shows its effectiveness in low-resource scenarios.
给定输入的源语言中句子X=x1x2…xi …xn,通过Transformer的Embedding层得到词向量,将词向量进行算术平均求和得到各自的句向量 e 后,注意力记忆网络首先根据相关函数M(e,D)从单语翻译记忆中获取有帮助的信息,然后将其融入到Transformer的解码中,从而指导解码的生成。句向量 e 的生成过程在下节中详细介绍。形式上,整个翻译记忆指导机器翻译的过程如式(1)所示
给定的单语翻译记忆库D中存放的句子为翻译记忆目标端句子,在此基础上模型实现了跨语言的翻译记忆匹配。但给定的翻译记忆库通常与训练语料类似,这样翻译记忆为模型训练提供了很好的参考,而且为每条输入的源句子确定什么是描述它的最佳翻译记忆。例如,给定一条源语言句子“the German defendant bought frozen pork from a Belgian company”可能在翻译记忆库中无法找到完全一致的翻译,但只要在翻译记忆中出现,就可以为解码的生成提供明确的信息。
将单语的翻译记忆定义为集合D={d1,d2,…,dk,…,dl }。首先训练一个Sentence-BERT模型,利用神经网络对翻译记忆中的每一条句子进行编码,得到固定长度的句向量,即键向量 u={ u1,u2,…, uk,…, ul }和值向量 v={ v1,v2,…, vk,…, vl }这两个向量集表示它们。在这里 uk 和 vk 是一致的,都为768维的句子向量。其中句向量的训练生成过程如下,以一条句子为例:
为了与得到的翻译记忆的句向量 uk 和 vk 维度保持一致,将编码器与解码器的嵌入维数设置为768,即n为768,然后对每一个源句子X的词向量进行求和再平均,最终得到一个1×768的句向量,记作 e,如式(2)所示
在进行翻译记忆检索时,将 e 作为“query”向量。在向量空间内,记忆网络通过取内积和Softmax来计算 uk 与输入的每一条源语言句子 e 的相似度,而 vk 作为潜在的句子向量生成最终的输出o。具体而言,注意力记忆网络首先将编码器端的句子向量 e 和单语翻译记忆库中每一条句子向量 uk 取内积,在这里使用一个Softmax函数,将 uk 与句向量 e 二者的内积得分标准化成一个隐含状态序列上的概率pk。计算过程如式(3)所示
SutskeverI, VinyalsO, LeQ.Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems 27:Annual Conference on Neural Information Processing Systems.Montreal, Canada,2014:3104-3112.
[3]
BahdanauD, ChoK, BengioY.Neural machine translation by jointly leaning to align and translate[C]//Proceedings of International Conference on Learning Representations. San Diego,USA:International Conference on Learning Representations,2015.
[4]
VaswaniA, ShazeerN, ParmarN, et al.Attention is all you need[C]//Proceeding of Advance in Neural Information Processing Systems.Long Beach,USA,2017:5998-6008.
GuJ, WangY, ChoK,et al.Search engine guided non-parametric neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.New Orleans, USA,2018:5133-5140.
[9]
ZhangJ, UtiyamaM, SumitaE,et al.Guiding neural machine translation with retrieved translation pieces[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.New Orleans,USA,2018:1325-1335.
[10]
BaphaA, FiratO.Non-parametric adaptation for neural machine translation[C]//Proceeding of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Hunan Language Technologies, Minneapolis, USA,2019:1921-1931.
[11]
CaoQ, XiongD.Encoding gated translation memory into neural machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels, Belgien,2018:3041-3047.
[12]
BulteB, TezcanA.Neural fuzzy repair:integrating fuzzy matches into neural machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Florence, Italy,2019: 1800-1809.
[13]
XUJ T, CregoJ M, SenellartJ.Boosting neural machine translation with similar translations[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Online,2020:1580-1590.
ReimersN, GurevychI.Sentence-BERT:sentence embeddings using siamese BERT-networks[C]//Proceedings of the 2019 Conference on Emprirical Methods in Natural Language Processing.Hong Kong, China,2019:3982-3992.
[16]
KoehnP, HoangH, BirchA,et al.Moses:open source toolkit for statistical machine translation[C]//Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics.Prague, Czech Republic,2007.
[17]
SennrichR, HaddowB, BirchA.Neural machine translation of rare words with subword units[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin, Germany,2016.
[18]
PapineniK, RoukosS, WardT,et al.BLEU:A method for Automatic Evaluation of Machine Translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia,USA,2002:311-318.
[19]
KingmaD, BaJ.Adam:A Method for Stochastic Optimiza-tion[C]//International Conference on Learning Representa-tions.San Diego,USA,2015.
[20]
SrivastavaN, HintonG, KrizhevskyA,et al.Dropout:a simple way prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.