论文标题
为什么更好的损失功能会导致不可传输的功能?
Why Do Better Loss Functions Lead to Less Transferable Features?
论文作者
论文摘要
先前的工作提出了许多新的损失功能和正规化器,以提高图像分类任务的测试准确性。但是,目前尚不清楚这些损失功能是否学习下游任务的更好表示。本文研究了训练目标的选择如何影响接受Imagenet训练的卷积神经网络的隐藏表示。我们表明,许多目标会导致成像网精度的统计学显着提高,而不是香草软磁性跨透明镜,但是所得的固定特征提取器转移到下游任务的情况下,损失的选择效果很小,当网络对新任务完全微调时,损失的影响很小。使用中心的内核对齐来测量网络隐藏表示之间的相似性,我们发现损耗函数之间的差异仅在网络的最后几层中才是明显的。我们深入研究了倒数第二层的表示,发现不同的目标和超参数组合会导致班级分离的级别不同。具有较高级别分离的表示形式在原始任务上获得了更高的准确性,但是它们的功能对于下游任务的有用不大。我们的结果表明,对于原始任务的学习不变功能与与转移任务相关的功能之间存在权衡。
Previous work has proposed many new loss functions and regularizers that improve test accuracy on image classification tasks. However, it is not clear whether these loss functions learn better representations for downstream tasks. This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet. We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully fine-tuned on the new tasks. Using centered kernel alignment to measure similarity between hidden representations of networks, we find that differences among loss functions are apparent only in the last few layers of the network. We delve deeper into representations of the penultimate layer, finding that different objectives and hyperparameter combinations lead to dramatically different levels of class separation. Representations with higher class separation obtain higher accuracy on the original task, but their features are less useful for downstream tasks. Our results suggest there exists a trade-off between learning invariant features for the original task and features relevant for transfer tasks.