SADT：将清晰度感知的最小化与自我鉴定相结合，以改善模型概括

论文标题

SADT：将清晰度感知的最小化与自我鉴定相结合，以改善模型概括

SADT: Combining Sharpness-Aware Minimization with Self-Distillation for Improved Model Generalization

论文作者

Fahim, Masud An-Nur Islam, Boutellier, Jani

论文摘要

改善深度神经网络训练时间和模型通用性的方法包括各种数据增强，正则化和优化方法，这些方法倾向于对超参数设置敏感，并使可重复性更具挑战性。这项工作共同考虑了两种近期的培训策略，这些培训策略涉及模型的普遍性：清晰度最小化和自我缩减，并提出了新颖的培训策略的新颖培训策略（SADT）。这项工作的实验部分表明，SADT在模型收敛时间，测试时间性能和模型的多种神经体系结构，数据集和超参数设置方面的模型收敛时间，测试时间性能和模型的通用性始终优于先前发布的培训策略。

Methods for improving deep neural network training times and model generalizability consist of various data augmentation, regularization, and optimization approaches, which tend to be sensitive to hyperparameter settings and make reproducibility more challenging. This work jointly considers two recent training strategies that address model generalizability: sharpness-aware minimization, and self-distillation, and proposes the novel training strategy of Sharpness-Aware Distilled Teachers (SADT). The experimental section of this work shows that SADT consistently outperforms previously published training strategies in model convergence time, test-time performance, and model generalizability over various neural architectures, datasets, and hyperparameter settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题