论文标题
环球电视:连接图像的连接语言
Globetrotter: Connecting Languages by Connecting Images
论文作者
论文摘要
几个语言之间的机器翻译一次是极具挑战性的,因为对地面真理进行培训需要所有语言对之间的监督,这很难获得。我们的主要见解是,尽管语言可能会发生巨大变化,但世界的基本视觉外观仍然一致。我们介绍了一种使用视觉观测来弥合语言之间差距的方法,而不是依靠平行语料库或表示形式的拓扑特性。我们训练一个模型,该模型将不同语言的文本段与与之相关的图像相似,并且每个图像依次与其文本描述相称时。我们在五十多种语言的新文本数据集上从头开始训练模型,并带有随附的图像。实验表明,我们的方法在使用检索上的无监督单词和句子翻译上优于先前的工作。代码,模型和数据可在Globetrotter.cs.columbia.edu上找到。
Machine translation between many languages at once is highly challenging, since training with ground truth requires supervision between all language pairs, which is difficult to obtain. Our key insight is that, while languages may vary drastically, the underlying visual appearance of the world remains consistent. We introduce a method that uses visual observations to bridge the gap between languages, rather than relying on parallel corpora or topological properties of the representations. We train a model that aligns segments of text from different languages if and only if the images associated with them are similar and each image in turn is well-aligned with its textual description. We train our model from scratch on a new dataset of text in over fifty languages with accompanying images. Experiments show that our method outperforms previous work on unsupervised word and sentence translation using retrieval. Code, models and data are available on globetrotter.cs.columbia.edu.