论文标题
基于空间表达式的视觉扎根对话的语言分析
A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial Expressions
论文作者
论文摘要
最近的模型在视觉扎根的对话中获得了有希望的结果。但是,现有的数据集通常包含不良的偏见,并且缺乏复杂的语言分析,这使得很难理解当前模型如何识别其精确的语言结构。为了解决这个问题,我们做出了两个设计选择:首先,我们专注于OneCommon语料库\ citep {udagawa2019natural,udagawa20202020202020202020202020202020202020202020202020202020202020ALTATED},这是一个简单而富有挑战性的常见基础数据集,其中包含设计最少的偏见。其次,我们基于\ textit {空间表达式}分析了他们的语言结构,并为600个对话提供了全面可靠的注释。我们表明,我们的注释捕获了重要的语言结构,包括谓词题目结构,修改和省略号。在我们的实验中,我们通过参考分辨率评估了模型对这些结构的理解。我们证明我们的注释可以在基本细节的基本水平中揭示基线模型的优势和劣势。总体而言,我们提出了一个新颖的框架和资源,用于在视觉扎根的对话中调查细粒度的语言理解。
Recent models achieve promising results in visually grounded dialogues. However, existing datasets often contain undesirable biases and lack sophisticated linguistic analyses, which make it difficult to understand how well current models recognize their precise linguistic structures. To address this problem, we make two design choices: first, we focus on OneCommon Corpus \citep{udagawa2019natural,udagawa2020annotated}, a simple yet challenging common grounding dataset which contains minimal bias by design. Second, we analyze their linguistic structures based on \textit{spatial expressions} and provide comprehensive and reliable annotation for 600 dialogues. We show that our annotation captures important linguistic structures including predicate-argument structure, modification and ellipsis. In our experiments, we assess the model's understanding of these structures through reference resolution. We demonstrate that our annotation can reveal both the strengths and weaknesses of baseline models in essential levels of detail. Overall, we propose a novel framework and resource for investigating fine-grained language understanding in visually grounded dialogues.