深度组合聚集

论文标题

深度组合聚集

Deep Combinatorial Aggregation

论文作者

Shen, Yuesong, Cremers, Daniel

论文摘要

已知神经网络会产生不确定性的不确定性估计，并提出了多种方法来解决此问题。这包括Deep Ensemble，这是一种简单有效的方法，可实现不确定性感知的学习任务的最先进结果。在这项工作中，我们探索了一个称为深组合聚集（DCA）的深度集合的组合概括。 DCA创建了多个网络组件的实例，并汇总了它们的组合以产生多元化的模型建议和预测。 DCA组件可以在不同水平的粒度上定义。而且我们发现，根据预测性能和不确定性估计，粗粒DCA可以胜过不确定性感知的学习。对于细粒DCA，我们发现一种名为Deep Compinatorial平均（DCWA）的平均参数化方法可以改善基线训练。它与随机重量平均（SWA）相当，但不需要任何自定义培训时间表或BatchNorm层的改编。此外，我们提出了一致性执行损失，有助于训练DCWA和Modelwise DCA。我们尝试实验内域，分布移位和分布外图像分类任务，并从经验上证实DCWA和DCA方法的有效性。

Neural networks are known to produce poor uncertainty estimations, and a variety of approaches have been proposed to remedy this issue. This includes deep ensemble, a simple and effective method that achieves state-of-the-art results for uncertainty-aware learning tasks. In this work, we explore a combinatorial generalization of deep ensemble called deep combinatorial aggregation (DCA). DCA creates multiple instances of network components and aggregates their combinations to produce diversified model proposals and predictions. DCA components can be defined at different levels of granularity. And we discovered that coarse-grain DCAs can outperform deep ensemble for uncertainty-aware learning both in terms of predictive performance and uncertainty estimation. For fine-grain DCAs, we discover that an average parameterization approach named deep combinatorial weight averaging (DCWA) can improve the baseline training. It is on par with stochastic weight averaging (SWA) but does not require any custom training schedule or adaptation of BatchNorm layers. Furthermore, we propose a consistency enforcing loss that helps the training of DCWA and modelwise DCA. We experiment on in-domain, distributional shift, and out-of-distribution image classification tasks, and empirically confirm the effectiveness of DCWA and DCA approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题