语言指导的语义样式转移3D室内场景

论文标题

语言指导的语义样式转移3D室内场景

Language-guided Semantic Style Transfer of 3D Indoor Scenes

论文作者

Jin, Bu, Tian, Beiwen, Zhao, Hao, Zhou, Guyue

论文摘要

我们解决了3D室内场景的语言引导语义风格转移的新问题。输入是一个3D室内场景网格和几个描述目标场景的短语。首先，通过多层感知器将3D顶点坐标映射到RGB残基。其次，通过针对室内场景量身定制的视点采样策略将彩色的3D网格分化为2D图像。第三，通过预训练的视觉模型将渲染的2D图像与短语进行比较。最后，错误将反向传播到多层感知器，以更新与某些语义类别相对应的顶点颜色。我们对公共扫描仪和场景数据集进行了大规模定性分析和A/B用户测试。我们证明：（1）视觉令人愉悦的结果，这些结果可能对多媒体应用有用。（2）从与人类先验一致的观点渲染3D室内场景很重要。（3）合并语义可显着提高样式转移质量。（4）HSV正则化项会导致结果与输入更一致，并且通常评分更好。代码和用户研究工具箱可从https://github.com/air-discover/lasst获得

We address the new problem of language-guided semantic style transfer of 3D indoor scenes. The input is a 3D indoor scene mesh and several phrases that describe the target scene. Firstly, 3D vertex coordinates are mapped to RGB residues by a multi-layer perceptron. Secondly, colored 3D meshes are differentiablly rendered into 2D images, via a viewpoint sampling strategy tailored for indoor scenes. Thirdly, rendered 2D images are compared to phrases, via pre-trained vision-language models. Lastly, errors are back-propagated to the multi-layer perceptron to update vertex colors corresponding to certain semantic categories. We did large-scale qualitative analyses and A/B user tests, with the public ScanNet and SceneNN datasets. We demonstrate: (1) visually pleasing results that are potentially useful for multimedia applications. (2) rendering 3D indoor scenes from viewpoints consistent with human priors is important. (3) incorporating semantics significantly improve style transfer quality. (4) an HSV regularization term leads to results that are more consistent with inputs and generally rated better. Codes and user study toolbox are available at https://github.com/AIR-DISCOVER/LASST

下载PDF全文

下载文献需遵守相关版权规定

论文标题