当结合较小的型号比单个大型模型更有效

论文标题

当结合较小的型号比单个大型模型更有效

When Ensembling Smaller Models is More Efficient than Single Large Models

论文作者

Kondratyuk, Dan, Tan, Mingxing, Brown, Matthew, Gong, Boqing

论文摘要

结合是一种简单而流行的技术，可通过训练多个模型（例如，具有不同的初始化）来提高评估性能并汇总其预测。这种方法通常是为最大模型保留的，因为通常认为，增加模型大小比结合较小的模型可提供更大的错误减少。但是，我们显示了在CIFAR-10和Imagenet上实验的结果，即使这些单个模型的权重和超级参数高度优化，同步可以胜过更高精度的单个模型，并且需要更少的总拖鞋进行计算。此外，随着模型变得较大，改进的差距扩大。这表明了一个有趣的观察结果，即结合的输出多样性通常比训练较大的模型更有效，尤其是当模型接近数据集可以促进的大小时。与其使用调整单个大型模型的共同练习，不如将合奏用作模型的推理速度和准确性之间的更灵活的权衡。这也有可能简化硬件设计，例如，一种更简单的方法，可以使多个工人的模型并行进行实时或分布式推理。

Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e.g., with different initializations) and aggregating their predictions. This approach is commonly reserved for the largest models, as it is commonly held that increasing the model size provides a more substantial reduction in error than ensembling smaller models. However, we show results from experiments on CIFAR-10 and ImageNet that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute, even when those individual models' weights and hyperparameters are highly optimized. Furthermore, this gap in improvement widens as models become large. This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models, especially when the models approach the size of what their dataset can foster. Instead of using the common practice of tuning a single large model, one can use ensembles as a more flexible trade-off between a model's inference speed and accuracy. This also potentially eases hardware design, e.g., an easier way to parallelize the model across multiple workers for real-time or distributed inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题