通过特征预测损失而无需重新构造的图像编码器

论文标题

通过特征预测损失而无需重新构造的图像编码器

Pretraining Image Encoders without Reconstruction via Feature Prediction Loss

论文作者

Pihlgren, Gustav Grund, Sandin, Fredrik, Liwicki, Marcus

论文摘要

这项工作研究了三种用于计算基于自动编码器的图像编码预处理损失的方法：常用的重建损失，最近引入了深层感知相似性损失，以及此处提出的特征预测损失；后者结果是最有效的选择。通过比较输入图像和重建图像来完成用于深度学习任务的标准自动编码器预测。最近的工作表明，基于图像自动编码器生成的嵌入的预测，可以通过感知损失（即通过在解码步骤后添加损耗网络）进行训练来改善。到目前为止，经过损失网络训练的自动编码器使用损失网络对原始图像和重建图像进行了明确的比较。但是，鉴于这样的损失网络，我们表明，不需要耗时的任务来解码整个图像。取而代之的是，我们建议解码损失网络的功能，因此名称为“特征预测损失”。为了评估这种方法，我们对三个标准公开数据集（Lunarlander-V2，STL-10和SVHN）进行实验，并比较训练图像编码器的六个不同程序（Pixel，感知相似性，特征预测损失；与图像编码/解码的两种变体结合）。基于嵌入的预测结果表明，接受特征预测损失训练的编码者比接受其他两个损失的训练的编码者一样好或更好。此外，与其他损失相比，使用特征预测损失的编码器要更快地训练。本工作中使用的方法实现可在线获得：https：//github.com/guspih/perceptual-autoencoders

This work investigates three methods for calculating loss for autoencoder-based pretraining of image encoders: The commonly used reconstruction loss, the more recently introduced deep perceptual similarity loss, and a feature prediction loss proposed here; the latter turning out to be the most efficient choice. Standard auto-encoder pretraining for deep learning tasks is done by comparing the input image and the reconstructed image. Recent work shows that predictions based on embeddings generated by image autoencoders can be improved by training with perceptual loss, i.e., by adding a loss network after the decoding step. So far the autoencoders trained with loss networks implemented an explicit comparison of the original and reconstructed images using the loss network. However, given such a loss network we show that there is no need for the time-consuming task of decoding the entire image. Instead, we propose to decode the features of the loss network, hence the name "feature prediction loss". To evaluate this method we perform experiments on three standard publicly available datasets (LunarLander-v2, STL-10, and SVHN) and compare six different procedures for training image encoders (pixel-wise, perceptual similarity, and feature prediction losses; combined with two variations of image and feature encoding/decoding). The embedding-based prediction results show that encoders trained with feature prediction loss is as good or better than those trained with the other two losses. Additionally, the encoder is significantly faster to train using feature prediction loss in comparison to the other losses. The method implementation used in this work is available online: https://github.com/guspih/Perceptual-Autoencoders

下载PDF全文

下载文献需遵守相关版权规定

论文标题