论文标题
通过剩余的条件互信息理解概括
Understanding Generalization via Leave-One-Out Conditional Mutual Information
论文作者
论文摘要
我们研究了学习算法的输出及其$ n $培训数据之间的(某些摘要)之间的共同信息,以$ n+1 $ i.i.d.的超级样本为条件。随机选择训练数据而无需替换的数据。这些算法(Steinke and Zakynthinou,2020)的条件相互信息(CMI)的这些剩余变体也被认为可以控制具有有界损耗函数的学习算法的平均通用误差。为了学习在0-1损失(即插值算法)下实现零经验风险的算法,我们在剩余的CMI与经典的一对一的误差估计中提供了明确的联系。使用此连接,我们就(评估)保留的CMI获得了上限和下限的风险。当限制风险是恒定的或多项式衰减时,边界会收敛到两个恒定系数。作为应用程序,我们分析了单个包含图算法的人口风险,这是一种在可实现的环境中的VC类的通用转导性学习算法。使用一对一的CMI,我们匹配在可实现的设置中学习VC课程的最佳界限,回答了Steinke和Zakynthinou(2020)提出的开放挑战。最后,为了了解剩余的CMI在研究概括中的作用,我们将剩余的CMI放置在措施层次结构中,并在根本上使用新颖的无条件互信息。对于0-1的损失和插值学习算法,观察到此相互信息恰恰是风险。
We study the mutual information between (certain summaries of) the output of a learning algorithm and its $n$ training data, conditional on a supersample of $n+1$ i.i.d. data from which the training data is chosen at random without replacement. These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean generalization error of learning algorithms with bounded loss functions. For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk. Using this connection, we obtain upper and lower bounds on risk in terms of the (evaluated) leave-one-out CMI. When the limiting risk is constant or decays polynomially, the bounds converge to within a constant factor of two. As an application, we analyze the population risk of the one-inclusion graph algorithm, a general-purpose transductive learning algorithm for VC classes in the realizable setting. Using leave-one-out CMI, we match the optimal bound for learning VC classes in the realizable setting, answering an open challenge raised by Steinke and Zakynthinou (2020). Finally, in order to understand the role of leave-one-out CMI in studying generalization, we place leave-one-out CMI in a hierarchy of measures, with a novel unconditional mutual information at the root. For 0-1 loss and interpolating learning algorithms, this mutual information is observed to be precisely the risk.