论文标题
物理上分离的表示
Physically Disentangled Representations
论文作者
论文摘要
生成表示学习中的最新方法会产生语义分离,但通常不考虑物理场景参数,例如几何,反照率,照明或相机。我们认为,逆渲染是一种从图像中恢复场景参数恢复场景参数的一种方法,也可以用于学习场景的物理分离,而无需监督。在本文中,我们借助了我们新颖的遗留,周期对比度损失(LOOCC)来提高学习表征中逆渲染的实用性,这些效用提高了下游聚类,线性分类和分割任务的准确性,从而提高了场景参数和稳健性的范围,从而提高了对远距离发光和观看点的稳健性。我们将方法与各种下游任务的其他生成表示方法进行比较,包括面部属性分类,情感识别,识别,面部分割和汽车分类。在所有任务中,我们的物理分离表示的准确性比在语义上分离的替代方案的准确性更高,高达18%。我们希望这项工作将激发未来的研究,以将反向渲染和3D理解的进步应用于表示学习。
State-of-the-art methods in generative representation learning yield semantic disentanglement, but typically do not consider physical scene parameters, such as geometry, albedo, lighting, or camera. We posit that inverse rendering, a way to reverse the rendering process to recover scene parameters from an image, can also be used to learn physically disentangled representations of scenes without supervision. In this paper, we show the utility of inverse rendering in learning representations that yield improved accuracy on downstream clustering, linear classification, and segmentation tasks with the help of our novel Leave-One-Out, Cycle Contrastive loss (LOOCC), which improves disentanglement of scene parameters and robustness to out-of-distribution lighting and viewpoints. We perform a comparison of our method with other generative representation learning methods across a variety of downstream tasks, including face attribute classification, emotion recognition, identification, face segmentation, and car classification. Our physically disentangled representations yield higher accuracy than semantically disentangled alternatives across all tasks and by as much as 18%. We hope that this work will motivate future research in applying advances in inverse rendering and 3D understanding to representation learning.