两层自动编码器的基本限制，并通过梯度方法实现它们

论文标题

两层自动编码器的基本限制，并通过梯度方法实现它们

Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods

论文作者

Shevchenko, Alexander, Kögler, Kevin, Hassani, Hamed, Mondelli, Marco

论文摘要

自动编码器是机器学习和有损数据压缩的许多分支中的流行模型。但是，即使在两层设置中，它们的基本限制，梯度方法的性能以及在优化过程中学到的特征也仍然很糟糕。实际上，较早的工作已经考虑了线性自动编码器或特定的培训方案（导致压缩率消失或不同）。我们的论文通过专注于以具有挑战性的比例制度训练的非线性两层自动编码器来解决这一差距，在这种情况下，输入维度与表示形式的大小线性缩放。我们的结果表征了人口风险的最小化，并表明这种最小化是通过梯度方法实现的。它们的结构也被揭露，从而简要描述了通过培训获得的功能。对于符号激活函数的特殊情况，我们的分析确定了通过（浅）自动编码器（浅）自动编码器对高斯来源有损压缩的基本限制。最后，尽管为高斯数据证明了结果，但标准数据集上的数值模拟显示了理论预测的普遍性。

Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题