蛋白石：新闻插图的多模式图像生成

论文标题

蛋白石：新闻插图的多模式图像生成

Opal: Multimodal Image Generation for News Illustration

论文作者

Liu, Vivian, Qiao, Han, Chilton, Lydia

论文摘要

多模式AI的进步向人们提供了从文本创建图像的有力方法。最近的工作表明，文本到图像的几代能够代表广泛的主题和艺术风格。但是，很难为文本提示找到正确的视觉语言。在本文中，我们用Opal解决了这一挑战，Opal是一个生产文本到图像的系统，以供新闻插图。鉴于文章，Opal通过对视觉概念进行结构化搜索来指导用户，并提供了一条管道，使用户可以根据文章的音调，关键字和相关的艺术风格生成插图。我们的评估表明，蛋白石有效地生成了各种新闻插图，视觉资产和概念思想。蛋白石生成的用户的可用结果是没有用户的用户。我们讨论结构化探索如何帮助用户更好地了解人AI共同创造系统的功能。

Advances in multimodal AI have presented people with powerful ways to create images from text. Recent work has shown that text-to-image generations are able to represent a broad range of subjects and artistic styles. However, finding the right visual language for text prompts is difficult. In this paper, we address this challenge with Opal, a system that produces text-to-image generations for news illustration. Given an article, Opal guides users through a structured search for visual concepts and provides a pipeline allowing users to generate illustrations based on an article's tone, keywords, and related artistic styles. Our evaluation shows that Opal efficiently generates diverse sets of news illustrations, visual assets, and concept ideas. Users with Opal generated two times more usable results than users without. We discuss how structured exploration can help users better understand the capabilities of human AI co-creative systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题