论文标题
从希尔伯特空间中的非随机数据学习:最佳恢复观点
Learning from Non-Random Data in Hilbert Spaces: An Optimal Recovery Perspective
论文作者
论文摘要
经典统计学习中概括的概念通常附在数据点是独立且分布的(IID)随机变量的假设上。尽管在许多应用程序中都相关,但这个假设一般可能不存在,鼓励开发对非IID数据的学习框架的开发。在这项工作中,我们从最佳的恢复角度考虑了回归问题。依靠与选择假设类别相当的模型假设,学习者旨在最大程度地减少最坏情况的误差,而无需求助于数据上的任何概率假设。我们首先开发了一个半决赛程序,用于计算有限维克斯空间中任何恢复图的最严重误差。然后,对于任何希尔伯特空间,我们都表明,最佳恢复提供了一种公式,该公式从算法观看点上是用户友好的,只要假设类是线性的。有趣的是,在某些情况下,该公式与内核无脊回归一致,证明最大程度地减少平均误差和最坏情况误差可以产生相同的解决方案。我们提供数值实验以支持我们的理论发现。
The notion of generalization in classical Statistical Learning is often attached to the postulate that data points are independent and identically distributed (IID) random variables. While relevant in many applications, this postulate may not hold in general, encouraging the development of learning frameworks that are robust to non-IID data. In this work, we consider the regression problem from an Optimal Recovery perspective. Relying on a model assumption comparable to choosing a hypothesis class, a learner aims at minimizing the worst-case error, without recourse to any probabilistic assumption on the data. We first develop a semidefinite program for calculating the worst-case error of any recovery map in finite-dimensional Hilbert spaces. Then, for any Hilbert space, we show that Optimal Recovery provides a formula which is user-friendly from an algorithmic point-of-view, as long as the hypothesis class is linear. Interestingly, this formula coincides with kernel ridgeless regression in some cases, proving that minimizing the average error and worst-case error can yield the same solution. We provide numerical experiments in support of our theoretical findings.