论文标题
离散数据的私人算法概括错误的上限
Upper Bounds on the Generalization Error of Private Algorithms for Discrete Data
论文作者
论文摘要
在这项工作中,我们从信息理论的角度研究了算法的概括能力。已经表明,算法的预期概括误差是从上面的函数来界定算法的输出假设的条件概率分布之间的相对熵的函数,鉴于训练有训练的数据集和其边际概率分布。我们以这个事实为基础,并引入数学公式,以在此相对熵上获得上限。假设数据是离散的,然后我们根据类型和典型性的方法制定了一种策略,以在稳定算法的概括误差(即产生相似输出假设的算法)上找到明确的上限,即给定输入数据集的算法。特别是,我们显示了针对$ε$ -DP和$μ$ -GDP算法的此策略获得的界限。
In this work, we study the generalization capability of algorithms from an information-theoretic perspective. It has been shown that the expected generalization error of an algorithm is bounded from above by a function of the relative entropy between the conditional probability distribution of the algorithm's output hypothesis, given the dataset with which it was trained, and its marginal probability distribution. We build upon this fact and introduce a mathematical formulation to obtain upper bounds on this relative entropy. Assuming that the data is discrete, we then develop a strategy using this formulation, based on the method of types and typicality, to find explicit upper bounds on the generalization error of stable algorithms, i.e., algorithms that produce similar output hypotheses given similar input datasets. In particular, we show the bounds obtained with this strategy for the case of $ε$-DP and $μ$-GDP algorithms.