带有潜在图像的开放域对话生成

论文标题

带有潜在图像的开放域对话生成

Open Domain Dialogue Generation with Latent Images

论文作者

Yang, Ze, Wu, Wei, Hu, Huang, Xu, Can, Wang, Wei, Li, Zhoujun

论文摘要

我们考虑使用图像接地开放域对话。现有工作假设可以使用图像和文本上下文，但是与文本对话相比，本质上获得图像的对话更难获得。因此，我们建议通过假设可以通过图像表示对话时的视觉场景信息，并试图通过文本到图像生成技术来恢复文本对话的潜在图像，从而通过图像接地对话和文本对话进行学习响应生成模型。然后，两种对话类型的可能性由响应发生器和图像重建器提出，这些对话是在条件变化自动编码框架内学习的。经验研究均在图像基础的对话和基于文本的对话中进行。在第一种情况下，可以通过带有潜在图像的文本对话来有效地增强图像接地的对话，尤其是在低资源设置下。在第二种情况下，潜在图像可以丰富响应的内容，同时使它们与上下文相关。

We consider grounding open domain dialogues with images. Existing work assumes that both an image and a textual context are available, but image-grounded dialogues by nature are more difficult to obtain than textual dialogues. Thus, we propose learning a response generation model with both image-grounded dialogues and textual dialogues by assuming that the visual scene information at the time of a conversation can be represented by an image, and trying to recover the latent images of the textual dialogues through text-to-image generation techniques. The likelihood of the two types of dialogues is then formulated by a response generator and an image reconstructor that are learned within a conditional variational auto-encoding framework. Empirical studies are conducted in both image-grounded conversation and text-based conversation. In the first scenario, image-grounded dialogues, especially under a low-resource setting, can be effectively augmented by textual dialogues with latent images; while in the second scenario, latent images can enrich the content of responses and at the same time keep them relevant to contexts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题