两层relu残差单位的非参数学习

论文标题

两层relu残差单位的非参数学习

Nonparametric Learning of Two-Layer ReLU Residual Units

论文作者

Wang, Zhunxuan, He, Linyun, Lyu, Chunchuan, Cohen, Shay B.

论文摘要

我们描述了一种算法，该算法使用校准线性单元（relu）激活学习两层残差单位：假设输入$ \ mathbf {x} $来自带有支持空间$ \ mathbb {r}^d $的分布的分布，而接地的生成模型是$ \ mathbf^y}的残留单位。 \ boldsymbol {b}^\ ast \ left [\ left（\ boldsymbol {a}^\ ast \ ast \ mathbf {x} \ reir \ Mathbb {r}^{d \ times d} $代表一个带有非止境条目的完整级别矩阵，$ \ boldsymbol {b}^\ ast \ in \ in \ mathbb {r}^{M \ times d} \ Mathbb {r}^d $，$ [\ boldsymbol {c}^{+}] _ i = \ max \ {0，c_i \} $。我们将层的目标设计为功能，其分析最小化器在其参数和非线性方面表达了确切的基础真相网络。 Following this objective landscape, learning residual units from finite samples can be formulated using convex optimization of a nonparametric function: for each layer, we first formulate the corresponding empirical risk minimization (ERM) as a positive semi-definite quadratic program (QP), then we show the solution space of the QP can be equivalently determined by a set of linear inequalities, which can then be efficiently solved by linear programming （LP）。我们进一步证明了我们的算法的强大统计一致性，并通过对合成数据和一组基准回归数据集的实验结果来证明其稳健性和样品效率。

We describe an algorithm that learns two-layer residual units using rectified linear unit (ReLU) activation: suppose the input $\mathbf{x}$ is from a distribution with support space $\mathbb{R}^d$ and the ground-truth generative model is a residual unit of this type, given by $\mathbf{y} = \boldsymbol{B}^\ast\left[\left(\boldsymbol{A}^\ast\mathbf{x}\right)^+ + \mathbf{x}\right]$, where ground-truth network parameters $\boldsymbol{A}^\ast \in \mathbb{R}^{d\times d}$ represent a full-rank matrix with nonnegative entries and $\boldsymbol{B}^\ast \in \mathbb{R}^{m\times d}$ is full-rank with $m \geq d$ and for $\boldsymbol{c} \in \mathbb{R}^d$, $[\boldsymbol{c}^{+}]_i = \max\{0, c_i\}$. We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities. Following this objective landscape, learning residual units from finite samples can be formulated using convex optimization of a nonparametric function: for each layer, we first formulate the corresponding empirical risk minimization (ERM) as a positive semi-definite quadratic program (QP), then we show the solution space of the QP can be equivalently determined by a set of linear inequalities, which can then be efficiently solved by linear programming (LP). We further prove the strong statistical consistency of our algorithm, and demonstrate its robustness and sample efficiency through experimental results on synthetic data and a set of benchmark regression datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题