矩阵产品状态机学习体系结构的概括和过度拟合

论文标题

矩阵产品状态机学习体系结构的概括和过度拟合

Generalization and Overfitting in Matrix Product State Machine Learning Architectures

论文作者

Strashko, Artem, Stoudenmire, E. Miles

论文摘要

尽管过度拟合并且更一般而言，双重下降在机器学习中无处不在，但增加了使用最广泛的张量网络的参数数量，但矩阵乘积状态（MPS）通常会导致先前研究中的测试性能单调改善。为了更好地了解由MPS参数参数的体系结构的概括属性，我们构建了人工数据，该数据可以由MPS精确建模并使用不同数量的参数训练模型。我们观察到一维数据的模型过于拟合，但也发现，对于更复杂的数据而言，过度拟合的意义较低，而对于MNIST图像数据，我们找不到任何过拟合的签名。我们推测，MPS的概括属性取决于数据的属性：具有一维数据（MPS ANSATZ是最合适的）MP的MPS容易拟合的数据，而使用更复杂的数据，MPS完全不适合MPS，过度拟合可能不那么重要。

While overfitting and, more generally, double descent are ubiquitous in machine learning, increasing the number of parameters of the most widely used tensor network, the matrix product state (MPS), has generally lead to monotonic improvement of test performance in previous studies. To better understand the generalization properties of architectures parameterized by MPS, we construct artificial data which can be exactly modeled by an MPS and train the models with different number of parameters. We observe model overfitting for one-dimensional data, but also find that for more complex data overfitting is less significant, while with MNIST image data we do not find any signatures of overfitting. We speculate that generalization properties of MPS depend on the properties of data: with one-dimensional data (for which the MPS ansatz is the most suitable) MPS is prone to overfitting, while with more complex data which cannot be fit by MPS exactly, overfitting may be much less significant.

下载PDF全文

下载文献需遵守相关版权规定

论文标题