论文标题
带有潜在图像的开放域对话生成
Open Domain Dialogue Generation with Latent Images
论文作者
论文摘要
我们考虑使用图像接地开放域对话。现有工作假设可以使用图像和文本上下文,但是与文本对话相比,本质上获得图像的对话更难获得。因此,我们建议通过假设可以通过图像表示对话时的视觉场景信息,并试图通过文本到图像生成技术来恢复文本对话的潜在图像,从而通过图像接地对话和文本对话进行学习响应生成模型。然后,两种对话类型的可能性由响应发生器和图像重建器提出,这些对话是在条件变化自动编码框架内学习的。经验研究均在图像基础的对话和基于文本的对话中进行。在第一种情况下,可以通过带有潜在图像的文本对话来有效地增强图像接地的对话,尤其是在低资源设置下。在第二种情况下,潜在图像可以丰富响应的内容,同时使它们与上下文相关。
We consider grounding open domain dialogues with images. Existing work assumes that both an image and a textual context are available, but image-grounded dialogues by nature are more difficult to obtain than textual dialogues. Thus, we propose learning a response generation model with both image-grounded dialogues and textual dialogues by assuming that the visual scene information at the time of a conversation can be represented by an image, and trying to recover the latent images of the textual dialogues through text-to-image generation techniques. The likelihood of the two types of dialogues is then formulated by a response generator and an image reconstructor that are learned within a conditional variational auto-encoding framework. Empirical studies are conducted in both image-grounded conversation and text-based conversation. In the first scenario, image-grounded dialogues, especially under a low-resource setting, can be effectively augmented by textual dialogues with latent images; while in the second scenario, latent images can enrich the content of responses and at the same time keep them relevant to contexts.