单词形状重要：可观的机器翻译带有视觉嵌入

论文标题

单词形状重要：可观的机器翻译带有视觉嵌入

Word Shape Matters: Robust Machine Translation with Visual Embedding

论文作者

Wang, Haohan, Zhang, Peiyan, Xing, Eric P.

论文摘要

神经机器的翻译在标准基准数据集上取得了显着的经验性能，但最近的证据表明，这些模型仍然可以轻松地处理不合格的输入，例如拼写错误的单词，为了克服这个问题，我们介绍了一个新的编码启发式的启发式启发式符号，用于字符级NLP模型的输入符号：它通过图像的形式编码图像在图像中绘制图像的形式。我们将这种新的策略命名为视觉嵌入，并有望改善NLP模型的鲁棒性，因为人类还通过印刷字母在视觉上处理语料库，而不是机械单热量向量。从经验上讲，我们的方法可以提高模型对不合格输入的鲁棒性，即使在测试方案中，模型的测试是用超出训练阶段可用的声音进行测试的。

Neural machine translation has achieved remarkable empirical performance over standard benchmark datasets, yet recent evidence suggests that the models can still fail easily dealing with substandard inputs such as misspelled words, To overcome this issue, we introduce a new encoding heuristic of the input symbols for character-level NLP models: it encodes the shape of each character through the images depicting the letters when printed. We name this new strategy visual embedding and it is expected to improve the robustness of NLP models because humans also process the corpus visually through printed letters, instead of machinery one-hot vectors. Empirically, our method improves models' robustness against substandard inputs, even in the test scenario where the models are tested with the noises that are beyond what is available during the training phase.

下载PDF全文

下载文献需遵守相关版权规定

论文标题