论文标题
Moganet:多阶门聚合网络
MogaNet: Multi-order Gated Aggregation Network
论文作者
论文摘要
通过将内核及加全局化,现代频道在计算机视觉任务中表现出巨大的潜力。然而,深度神经网络(DNN)中多阶游戏理论互动的最新进展揭示了现代频道的表示瓶颈,在这种情况下,尚未有效地编码了核心大小的增加。为了应对这一挑战,我们提出了一个名为Moganet的新的现代探手家族,以在基于纯粹的Convnet模型中具有良好的复杂性 - 性能权衡取舍。 Moganet将概念上简单而有效的卷积封装,并将封闭的聚合囊成一个紧凑的模块,在这种模块中,有效地收集了判别特征并适应上下文化。与最新的ImageNet和各种下游视觉基准相比,Moganet具有出色的可扩展性,参数的令人印象深刻的效率以及竞争性能,包括可可对象检测,ADE20K语义细分,2D和3D人类姿势估计以及视频预测。值得注意的是,Moganet在ImagEnet-1K上以5.2m和1.81m的参数命中87.0%和87.8%的精度,表现优于PARC-NET和CORVNEXT-L,而分别节省了59%的拖台和1700万参数。源代码可在https://github.com/westlake-ai/moganet上找到。
By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on multi-order game-theoretic interaction within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59% FLOPs and 17M parameters, respectively. The source code is available at https://github.com/Westlake-AI/MogaNet.