论文标题
从数据中学习封闭形式数学模型的基本限制
Fundamental limits to learning closed-form mathematical models from data
论文作者
论文摘要
给定一个有限且嘈杂的数据集,该数据集用封闭形式的数学模型生成,什么时候可以从数据中学习真正的生成模型?这是我们在这里调查的问题。我们表明,这个模型学习问题显示了从低噪声阶段的过渡,在该阶段可以学习真实的模型,即观察噪声太高,无法通过任何方法学习真实模型。无论是在低噪声阶段还是在高噪声阶段,概率模型的选择都导致最佳的概括,从而无法看到。这与标准的机器学习方法相反,包括人工神经网络(在此特定问题中,在低噪声阶段都有限制,它们的插值能力是有限的。在可学习的阶段和不可学习阶段之间的过渡区域中,对于包括概率模型选择在内的所有方法,都很难概括。
Given a finite and noisy dataset generated with a closed-form mathematical model, when is it possible to learn the true generating model from the data alone? This is the question we investigate here. We show that this model-learning problem displays a transition from a low-noise phase in which the true model can be learned, to a phase in which the observation noise is too high for the true model to be learned by any method. Both in the low-noise phase and in the high-noise phase, probabilistic model selection leads to optimal generalization to unseen data. This is in contrast to standard machine learning approaches, including artificial neural networks, which in this particular problem are limited, in the low-noise phase, by their ability to interpolate. In the transition region between the learnable and unlearnable phases, generalization is hard for all approaches including probabilistic model selection.