论文标题
语言间单词插件的视觉接地
Visual Grounding of Inter-lingual Word-Embeddings
论文作者
论文摘要
语言的视觉基础旨在通过多种视觉知识来源(例如图像和视频)丰富语言的文本表示。尽管视觉接地是一个深入研究的领域,但视觉接地的语法方面并没有得到太多关注。本研究调查了单词嵌入的语法视觉基础。我们在两个视觉和语言空间之间提出了一种隐式对齐技术,其中语言间的文本信息相互作用以丰富预训练的文本词嵌入。我们专注于实验中的三种语言,即英语,阿拉伯语和德语。我们获得了这些语言的视觉基础矢量表示形式,并研究了在一种或多种语言上的视觉接地是否改善了嵌入在单词相似性和分类基准上的嵌入性能。我们的实验表明,语法知识可以改善类似语言(例如德语和英语)的扎根嵌入性能。但是,德语或英语与阿拉伯语的语法基础导致单词相似性基准的性能略有下降。另一方面,我们观察到了分类基准的相反趋势,而阿拉伯语对英语的进步最大。在讨论部分中,提出了这些发现的几个原因。我们希望我们的实验为进一步研究的基线提供了有关语法间视觉接地的基准。
Visual grounding of Language aims at enriching textual representations of language with multiple sources of visual knowledge such as images and videos. Although visual grounding is an area of intense research, inter-lingual aspects of visual grounding have not received much attention. The present study investigates the inter-lingual visual grounding of word embeddings. We propose an implicit alignment technique between the two spaces of vision and language in which inter-lingual textual information interacts in order to enrich pre-trained textual word embeddings. We focus on three languages in our experiments, namely, English, Arabic, and German. We obtained visually grounded vector representations for these languages and studied whether visual grounding on one or multiple languages improved the performance of embeddings on word similarity and categorization benchmarks. Our experiments suggest that inter-lingual knowledge improves the performance of grounded embeddings in similar languages such as German and English. However, inter-lingual grounding of German or English with Arabic led to a slight degradation in performance on word similarity benchmarks. On the other hand, we observed an opposite trend on categorization benchmarks where Arabic had the most improvement on English. In the discussion section, several reasons for those findings are laid out. We hope that our experiments provide a baseline for further research on inter-lingual visual grounding.