绘制您的艺术梦：多模式指导扩散的多样化数字艺术综合

论文标题

绘制您的艺术梦：多模式指导扩散的多样化数字艺术综合

Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion

论文作者

Huang, Nisha, Tang, Fan, Dong, Weiming, Xu, Changsheng

论文摘要

由于有效地吸引了公众，数字艺术综合正在受到多媒体社区的越来越多的关注。当前的数字艺术合成方法通常使用单模式输入作为指导，从而限制了模型的表现力和生成结果的多样性。为了解决这个问题，我们提出了多模式引导的艺术品扩散（MGAD）模型，该模型是一种基于扩散的数字艺术品生成方法，它利用多模式提示作为控制无分类器扩散模型的指导。此外，对比度语言图像预审计（剪辑）模型用于统一文本和图像模式。关于生成的数字艺术绘画质量和数量的广泛实验结果证实了扩散模型和多模式指导的有效性。代码可从https://github.com/haha-lisa/mgad-multimodal-guided-artwork-diffusion获得。

Digital art synthesis is receiving increasing attention in the multimedia community because of engaging the public with art effectively. Current digital art synthesis methods usually use single-modality inputs as guidance, thereby limiting the expressiveness of the model and the diversity of generated results. To solve this problem, we propose the multimodal guided artwork diffusion (MGAD) model, which is a diffusion-based digital artwork generation approach that utilizes multimodal prompts as guidance to control the classifier-free diffusion model. Additionally, the contrastive language-image pretraining (CLIP) model is used to unify text and image modalities. Extensive experimental results on the quality and quantity of the generated digital art paintings confirm the effectiveness of the combination of the diffusion model and multimodal guidance. Code is available at https://github.com/haha-lisa/MGAD-multimodal-guided-artwork-diffusion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题