论文标题
证明均衡彩票假设的一般框架
A General Framework For Proving The Equivariant Strong Lottery Ticket Hypothesis
论文作者
论文摘要
强有力的彩票假设(SLTH)规定了足够过度参数(密集的)神经网络中的子网的存在,当随机初始化并且没有任何培训时,可以实现受过训练的目标网络的准确性。 Da Cunha等的最新作品。 Al 2022; Burkholz 2022表明,SLTH可以扩展到翻译模棱两可的网络(即CNNS),具有与密度网络中SLTS所需的相同级别的过透明级。但是,现代的神经网络能够不仅融合翻译对称性,而且开发一般的模棱两可的体系结构(例如旋转和排列)一直是一个有力的设计原则。 In this paper, we generalize the SLTH to functions that preserve the action of the group $G$ -- i.e. $G$-equivariant network -- and prove, with high probability, that one can approximate any $G$-equivariant network of fixed width and depth by pruning a randomly initialized overparametrized $G$-equivariant network to a $G$-equivariant subnetwork.我们进一步证明,我们规定的过分透明化方案是最佳的,并提供了有效参数数量的下限,这是误差公差的函数。我们为各个组开发理论,包括欧几里得$ \ text {e}(2)$和对称组$ g \ leq leq \ leq \ mathcal {s} _n $的子组 - 允许我们找到mlps,cnns,cnns,$ \ text {e}(2)的$ - cnns $ - cnns $ - cnns of ereef,以及我们的cnnns和ccnns of ereseiant和connns的cnnns和conn n.统一框架。从经验上讲,我们通过修剪过度叠加的$ \ text {e}(2)$ - 可验证的CNN,$ k $ - 订单gnns和消息传递GNNS来符合训练有素的目标网络的性能,从而验证了我们的理论。
The Strong Lottery Ticket Hypothesis (SLTH) stipulates the existence of a subnetwork within a sufficiently overparameterized (dense) neural network that -- when initialized randomly and without any training -- achieves the accuracy of a fully trained target network. Recent works by Da Cunha et. al 2022; Burkholz 2022 demonstrate that the SLTH can be extended to translation equivariant networks -- i.e. CNNs -- with the same level of overparametrization as needed for the SLTs in dense networks. However, modern neural networks are capable of incorporating more than just translation symmetry, and developing general equivariant architectures such as rotation and permutation has been a powerful design principle. In this paper, we generalize the SLTH to functions that preserve the action of the group $G$ -- i.e. $G$-equivariant network -- and prove, with high probability, that one can approximate any $G$-equivariant network of fixed width and depth by pruning a randomly initialized overparametrized $G$-equivariant network to a $G$-equivariant subnetwork. We further prove that our prescribed overparametrization scheme is optimal and provides a lower bound on the number of effective parameters as a function of the error tolerance. We develop our theory for a large range of groups, including subgroups of the Euclidean $\text{E}(2)$ and Symmetric group $G \leq \mathcal{S}_n$ -- allowing us to find SLTs for MLPs, CNNs, $\text{E}(2)$-steerable CNNs, and permutation equivariant networks as specific instantiations of our unified framework. Empirically, we verify our theory by pruning overparametrized $\text{E}(2)$-steerable CNNs, $k$-order GNNs, and message passing GNNs to match the performance of trained target networks.