论文标题
深度孵化:通过分割和征服训练大型模型
Deep Incubation: Training Large Models by Divide-and-Conquering
论文作者
论文摘要
近年来,大型深度学习模型取得了巨大的成功。但是,由于较高的计算成本,痛苦的收敛性和过度拟合问题,培训这些模型是具有挑战性的。在本文中,我们提出了深层的孵化,这是一种新型方法,可以通过将它们分成较小的亚模型来实现大型模型的有效训练,这些子模型可以分别训练并无缝地组装。实施这一想法的关键挑战是确保受独立训练的子模块的兼容性。为了解决这个问题,我们首先引入了一个全局共享的元模型,该模型被利用以隐式将所有模块链接在一起,并且可以用作具有可忽略的计算开销的非常小的网络。然后,我们提出了一个模块孵化算法,该算法训练每个子模块以替换元模型的相应组件并完成给定的学习任务。尽管很简单,但我们的方法有效地鼓励每个子模块在目标大型模型中意识到其在目标大型模型中的作用,以便最终学习的子模型可以在组装后平稳地彼此合作。从经验上讲,我们的方法在最终准确性和训练效率方面都优于端到端(E2E)培训。例如,除了维特(Vit-Huge)之外,它在ImageNet上的精度提高了2.7%,或者以减少4倍的训练时间来提高相似的性能。值得注意的是,收益对于下游任务也很重要(例如,对可可和ADE20K上的对象检测和图像分割)。代码可在https://github.com/leaplabthu/deep-cubation上找到。
Recent years have witnessed a remarkable success of large deep learning models. However, training these models is challenging due to high computational costs, painfully slow convergence, and overfitting issues. In this paper, we present Deep Incubation, a novel approach that enables the efficient and effective training of large models by dividing them into smaller sub-modules that can be trained separately and assembled seamlessly. A key challenge for implementing this idea is to ensure the compatibility of the independently trained sub-modules. To address this issue, we first introduce a global, shared meta model, which is leveraged to implicitly link all the modules together, and can be designed as an extremely small network with negligible computational overhead. Then we propose a module incubation algorithm, which trains each sub-module to replace the corresponding component of the meta model and accomplish a given learning task. Despite the simplicity, our approach effectively encourages each sub-module to be aware of its role in the target large model, such that the finally-learned sub-modules can collaborate with each other smoothly after being assembled. Empirically, our method outperforms end-to-end (E2E) training in terms of both final accuracy and training efficiency. For example, on top of ViT-Huge, it improves the accuracy by 2.7% on ImageNet or achieves similar performance with 4x less training time. Notably, the gains are significant for downstream tasks as well (e.g., object detection and image segmentation on COCO and ADE20K). Code is available at https://github.com/LeapLabTHU/Deep-Incubation.