通过端到端培训进入紧凑的神经网络：具有自动排名确定的贝叶斯张量方法

论文标题

通过端到端培训进入紧凑的神经网络：具有自动排名确定的贝叶斯张量方法

Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination

论文作者

Hawkins, Cole, Liu, Xing, Zhang, Zheng

论文摘要

虽然训练后模型压缩可以大大降低深神经网络的推理成本，但未压缩的培训仍然消耗大量硬件资源，运行时和能量。非常需要直接从头开始训练紧凑的神经网络，其记忆力低和计算成本较低。低量张量分解是减少大型神经网络的记忆和计算要求的最有效方法之一。但是，直接训练低级别张力的神经网络是一项非常具有挑战性的任务，因为很难确定适当的张量等级{\ it先验}，该均可控制训练过程中模型的复杂性和压缩比。本文介绍了一个新型的端到端框架，用于对神经网络的低排名训练。我们首先开发了一种灵活的贝叶斯模型，该模型可以处理各种低级张量格式（例如CP，Tucker，Tensor Train和Tensor-Train矩阵），可压缩训练中的神经网络参数。该模型可以自动确定非线性正向模型内的张量排名，这超出了现有贝叶斯张量方法的能力。我们进一步开发了可扩展的随机变异推理求解器，以估计训练中大规模问题的后部密度。我们的工作为端到端的训练提供了第一个通用等级自适应框架。我们对各种神经网络体系结构的数值结果表明，在训练过程中，降低了魔力参数的顺序，精度损失很小（甚至更高的精度）。具体来说，在一个超过$ 4.2 \ times 10^9 $模型参数的非常大的深度学习推荐系统上，我们的方法可以在培训过程中自动将变量自动减少到$ 1.6 \ times 10^5 $（即$ 2.6 \ times 10^4 $ timper），而实现几乎相同的准确性。

While post-training model compression can greatly reduce the inference cost of a deep neural network, uncompressed training still consumes a huge amount of hardware resources, run-time and energy. It is highly desirable to directly train a compact neural network from scratch with low memory and low computational cost. Low-rank tensor decomposition is one of the most effective approaches to reduce the memory and computing requirements of large-size neural networks. However, directly training a low-rank tensorized neural network is a very challenging task because it is hard to determine a proper tensor rank {\it a priori}, which controls the model complexity and compression ratio in the training process. This paper presents a novel end-to-end framework for low-rank tensorized training of neural networks. We first develop a flexible Bayesian model that can handle various low-rank tensor formats (e.g., CP, Tucker, tensor train and tensor-train matrix) that compress neural network parameters in training. This model can automatically determine the tensor ranks inside a nonlinear forward model, which is beyond the capability of existing Bayesian tensor methods. We further develop a scalable stochastic variational inference solver to estimate the posterior density of large-scale problems in training. Our work provides the first general-purpose rank-adaptive framework for end-to-end tensorized training. Our numerical results on various neural network architectures show orders-of-magnitude parameter reduction and little accuracy loss (or even better accuracy) in the training process. Specifically, on a very large deep learning recommendation system with over $4.2\times 10^9$ model parameters, our method can reduce the variables to only $1.6\times 10^5$ automatically in the training process (i.e., by $2.6\times 10^4$ times) while achieving almost the same accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题