通过语义分割中的重量融合来提高预测性能和校准

论文标题

通过语义分割中的重量融合来提高预测性能和校准

Improving Predictive Performance and Calibration by Weight Fusion in Semantic Segmentation

论文作者

Sämann, Timo, Hammam, Ahmed Mostafa, Bursuc, Andrei, Stiller, Christoph, Groß, Horst-Michael

论文摘要

平均网络集合的预测是改善各种基准和kaggle竞争中预测性能和计算的尖端有效方法。但是，深层合奏的Thruntime和培训成本随着整体的规模线性增长，使它们不适合许多应用。平均重量的权重代替预测规定了这种不利性推断，通常应用于模型的中间检查点以降低训练成本。尽管有效，但只有很少的作品可以平均体重的理解和表现。我们描述了重量必须符合体重空间，功能空间和损失的Interms的先决条件。此外，我们提出了新的测试方法（称为Oracle测试），以测量权重之间的功能空间。我们证明了我们在艺术分割CNN和变形金刚以及BDD100K和CityScapes等现实世界中的WF策略的多功能性。我们将WF与类似的操作进行了比较，并显示了我们对预测性能和校准的术语的优势。

Averaging predictions of a deep ensemble of networks is apopular and effective method to improve predictive performance andcalibration in various benchmarks and Kaggle competitions. However, theruntime and training cost of deep ensembles grow linearly with the size ofthe ensemble, making them unsuitable for many applications. Averagingensemble weights instead of predictions circumvents this disadvantageduring inference and is typically applied to intermediate checkpoints ofa model to reduce training cost. Albeit effective, only few works haveimproved the understanding and the performance of weight averaging.Here, we revisit this approach and show that a simple weight fusion (WF)strategy can lead to a significantly improved predictive performance andcalibration. We describe what prerequisites the weights must meet interms of weight space, functional space and loss. Furthermore, we presenta new test method (called oracle test) to measure the functional spacebetween weights. We demonstrate the versatility of our WF strategy acrossstate of the art segmentation CNNs and Transformers as well as real worlddatasets such as BDD100K and Cityscapes. We compare WF with similarapproaches and show our superiority for in- and out-of-distribution datain terms of predictive performance and calibration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题