学习图像修饰的多种音调样式

论文标题

学习图像修饰的多种音调样式

Learning Diverse Tone Styles for Image Retouching

论文作者

Wang, Haolin, Zhang, Jiawei, Liu, Ming, Wu, Xiaohe, Zuo, Wangmeng

论文摘要

图像修饰，旨在再生给定图像的视觉令人愉悦的演绎，是用户具有不同美学感觉的主观任务。大多数现有的方法部署了确定性模型，以从特定的专家那里学习修饰样式，从而使其灵活地降低了各种主观偏好。此外，由于对不同图像的靶向处理，专家的内在多样性也被缺乏描述。为了避免此类问题，我们建议通过基于流动的体系结构来学习各种图像修饰。与直接生成输出图像的当前基于流的方法不同，我们认为在样式域中学习可以（i）将修饰样式从图像内容中解散，（ii）导致稳定的样式表现形式，并且（iii）避免空间不便效应。为了获得有意义的图像音调样式表示，设计了联合培训管道，设计由样式编码器，有条件的修饰网和图像音调样式标准化流（TSFLOW）模块组成。特别是，样式编码器预测输入图像的目标样式表示形式，该图像是用于修饰的修饰网中的条件信息，而TSFlow将样式表示向量映射到前向通行中的高斯分布。训练后，TSFlow可以通过从高斯分布中取样来产生多样的图像音调矢量。关于MIT-Adobe Fivek和PPR10K数据集的广泛实验表明，我们提出的方法对最新方法有利，并且有效地产生了不同的结果以满足不同的人类美学偏好。源代码和预培训模型可在https://github.com/ssrheart/tsflow上公开获得。

Image retouching, aiming to regenerate the visually pleasing renditions of given images, is a subjective task where the users are with different aesthetic sensations. Most existing methods deploy a deterministic model to learn the retouching style from a specific expert, making it less flexible to meet diverse subjective preferences. Besides, the intrinsic diversity of an expert due to the targeted processing on different images is also deficiently described. To circumvent such issues, we propose to learn diverse image retouching with normalizing flow-based architectures. Unlike current flow-based methods which directly generate the output image, we argue that learning in a style domain could (i) disentangle the retouching styles from the image content, (ii) lead to a stable style presentation form, and (iii) avoid the spatial disharmony effects. For obtaining meaningful image tone style representations, a joint-training pipeline is delicately designed, which is composed of a style encoder, a conditional RetouchNet, and the image tone style normalizing flow (TSFlow) module. In particular, the style encoder predicts the target style representation of an input image, which serves as the conditional information in the RetouchNet for retouching, while the TSFlow maps the style representation vector into a Gaussian distribution in the forward pass. After training, the TSFlow can generate diverse image tone style vectors by sampling from the Gaussian distribution. Extensive experiments on MIT-Adobe FiveK and PPR10K datasets show that our proposed method performs favorably against state-of-the-art methods and is effective in generating diverse results to satisfy different human aesthetic preferences. Source code and pre-trained models are publicly available at https://github.com/SSRHeart/TSFlow.

下载PDF全文

下载文献需遵守相关版权规定

论文标题